R Programming in Data Science: High Volume Data
1h 25mIntermediate2018-10-26
Authors

Mark Niemann-Ross
Technologist experienced in hardware, software, and science fiction
Course details
Data fills all available space, and now that storage is cheap, the amount of data has exploded. However, all that information is useless without analysis and context. The R programming language is designed to make it easier to analyze and visualize massive amounts of data. For example, R provides the ability to multiply one block of variables by another—an assumption that provides inherent advantages over other languages. This course shows why R is ideal for high volumes of data, introduces more efficient ways to use the language, and explains how to avoid the problems and capitalize on the opportunities of big data. Learn how to determine if you have enough memory and processing power, produce visualizations of big data, optimize your R code, and use advanced techniques such as parallel processing to speed up your computations. Plus, discover how to integrate R with big-data solutions such as SQL databases and Apache Spark.
Learning objectives
Accessing memory and processing power
Visualizing high-volume data
Profiling and optimizing R code
Compiling R functions
Parallel processing with R
Using R with other big data solutions
Learning objectives
Accessing memory and processing power
Visualizing high-volume data
Profiling and optimizing R code
Compiling R functions
Parallel processing with R
Using R with other big data solutions
Skills covered
RStudioRStatisticsData EngineeringData AnalysisProgramming LanguagesData ScienceBusiness Analysis and StrategyBusiness Software and ToolsOpen SourceSoftware DevelopmentDeep Dive (X:Y)
Concepts
0. Introduction
- 01 - Wrangling high-volume data with R
- 02 - Sample data set
1. Problems and Opportunities with High-Volume Data
- 03 - Perspectives on high-volume data
- 04 - Big data and available memory
- 05 - Code - Finding available memory
- 06 - Big data and CPU cycles
- 07 - Code - How fast is your computer
2. Visualizing High-Volume Data
- 08 - High-volume data and visualizations
- 09 - Code - Graphs for high-volume data
- 10 - Code - rug() and jitter()
- 11 - Code - Applying statistics to plots
- 12 - Code - Subsampled graphs for high-volume data
- 13 - Code - Trellising data across multiple charts
3. Working within the R Programming Language
- 14 - R programming tools for high-volume data
- 15 - Downsampling
- 16 - Profile R code to find inefficiencies
- 17 - Code - Profile R code to find inefficiencies
- 18 - Avoid the copy-on-modify problem with R
- 19 - Code - Avoid copy-on-modify with data.table
- 20 - Optimization versus readability
4. Advanced High-Volume Techniques
- 21 - Compile R functions
- 22 - Parallel processing with R
- 23 - Code - Parallel R functions
- 24 - bigmemory, LaF, and ff packages
5. Use R with External Big Data Solutions
- 25 - Store high-volume data in a database
- 26 - Code - R with databases
- 27 - Cloud computing with R
- 28 - Sparklyr with R
- 29 - Code - R with Sparklyr
Conclusion
- 30 - Summary of high-volume data with R
Related courses
- Data Visualization in R with ggplot2
- Data Wrangling in R
- Cleaning Bad Data in R
- Complete Guide to R: Wrangling, Visualizing, and Modeling Data
- Complete Your First Project in R
- R for Data Science: Lunch Break Lessons
- Cert Prep: Certified Analytics Professional (CAP)
- Machine Learning with Data Reduction in Excel, R, and Power BI