Special offers now — see discounted courses.
day
:
hour
:
min
:
sec
See special offers
Processing Text with R Essential Training

Processing Text with R Essential Training

56mIntermediate2019-09-19

Authors

Kumaran Ponnambalam

Kumaran Ponnambalam

Working with data for 20+ years

Course details

Today’s big data and analytics pipelines are consuming more and more text data generated through websites, social media, and private communications. But deriving insights from text isn't straightforward; it requires a series of techniques and forms for preparing text for analytics and machine learning. In this course, learn the essential techniques for cleansing and processing text in R, and discover how to convert text to a form that's ready for analytics and predictions. Kumaran Ponnambalam begins by reviewing techniques for extracting, cleansing, and processing text. He then shows how to convert text into an analytics-ready form, including how to use n-grams and TF-IDF. Throughout the course, he provides examples for exercising these techniques using the R and tm libraries.

Learning objectives
Acquiring text from various sources
Cleansing and transforming text data
Preparing TF-IDF matrices for machine learning
Building n-grams databases for text predictions
Best practices for scalability and storing text

Skills covered

RStatisticsEssential TrainingProgramming LanguagesData ScienceOpen SourceSoftware Development

Concepts

0. Introduction

  • 01 - The emergence of text analytics

1. Introduction to Text Mining

  • 02 - Purpose
  • 03 - Document
  • 04 - Corpus
  • 05 - R text processing libraries
  • 06 - Setting up the environment

2. Corpus in R

  • 07 - PCorpus and VCorpus
  • 08 - Reading files with CorpusReader
  • 09 - Exploring the corpus
  • 10 - Persisting the corpus

3. Text Cleansing and Extraction

  • 11 - Setup for processing
  • 12 - Cleansing text
  • 13 - Stop word removal
  • 14 - Stemming
  • 15 - Managing metadata

4. TF-IDF

  • 16 - Introduction to tf-idf
  • 17 - Generating term frequency matrix
  • 18 - Improving term frequency matrix
  • 19 - Plotting term frequency
  • 20 - Generating tf-idf

5. N-Grams

  • 21 - N-grams concepts
  • 22 - Using RWeka NGramTokenizer
  • 23 - Creating an n-gram text frequency matrix
  • 24 - Extracting n-gram pairs

6. Best Practices

  • 25 - Storing text
  • 26 - Processing text data
  • 27 - Scalability

Conclusion

  • 28 - Next steps

Related courses

About us

LyndaKade is a leading learning platform that helps people learn business, software, technology, and creative skills to achieve personal and professional goals.

Phone numberAparat ChannelTelegram SupportTelegram ChannelInstagram Page

All rights to this site belong to LyndaKade.

Terms of Service|Privacy Policy

نماد الکترونیک enamad در صورت اتصال با آی‌پی داخل کشور، نمایش داده خواهد شد.
logo-samandehi - لوگو ساماندهی
zarinpal
zibal