Processing Text with R Essential Training
56mIntermediate2019-09-19
Authors

Kumaran Ponnambalam
Working with data for 20+ years
Course details
Today’s big data and analytics pipelines are consuming more and more text data generated through websites, social media, and private communications. But deriving insights from text isn't straightforward; it requires a series of techniques and forms for preparing text for analytics and machine learning. In this course, learn the essential techniques for cleansing and processing text in R, and discover how to convert text to a form that's ready for analytics and predictions. Kumaran Ponnambalam begins by reviewing techniques for extracting, cleansing, and processing text. He then shows how to convert text into an analytics-ready form, including how to use n-grams and TF-IDF. Throughout the course, he provides examples for exercising these techniques using the R and tm libraries.
Learning objectives
Acquiring text from various sources
Cleansing and transforming text data
Preparing TF-IDF matrices for machine learning
Building n-grams databases for text predictions
Best practices for scalability and storing text
Learning objectives
Acquiring text from various sources
Cleansing and transforming text data
Preparing TF-IDF matrices for machine learning
Building n-grams databases for text predictions
Best practices for scalability and storing text
Skills covered
RStatisticsEssential TrainingProgramming LanguagesData ScienceOpen SourceSoftware Development
Concepts
0. Introduction
- 01 - The emergence of text analytics
1. Introduction to Text Mining
- 02 - Purpose
- 03 - Document
- 04 - Corpus
- 05 - R text processing libraries
- 06 - Setting up the environment
2. Corpus in R
- 07 - PCorpus and VCorpus
- 08 - Reading files with CorpusReader
- 09 - Exploring the corpus
- 10 - Persisting the corpus
3. Text Cleansing and Extraction
- 11 - Setup for processing
- 12 - Cleansing text
- 13 - Stop word removal
- 14 - Stemming
- 15 - Managing metadata
4. TF-IDF
- 16 - Introduction to tf-idf
- 17 - Generating term frequency matrix
- 18 - Improving term frequency matrix
- 19 - Plotting term frequency
- 20 - Generating tf-idf
5. N-Grams
- 21 - N-grams concepts
- 22 - Using RWeka NGramTokenizer
- 23 - Creating an n-gram text frequency matrix
- 24 - Extracting n-gram pairs
6. Best Practices
- 25 - Storing text
- 26 - Processing text data
- 27 - Scalability
Conclusion
- 28 - Next steps
Related courses
- Data Science Reporting with Quarto for Python
- Data Visualization in R with ggplot2
- Data Wrangling in R
- Cleaning Bad Data in R
- Designing Big Data Healthcare Studies, Part One
- Designing Big Data Healthcare Studies, Part Two
- Algorithmic Trading and Finance Models with Python, R, and Stata Essential Training
- R Tidyverse Applications