Stream Processing Design Patterns with Spark
1h 9mAdvanced2020-10-14
Authors

Kumaran Ponnambalam
Working with data for 20+ years
Course details
Stream processing is becoming more popular as more and more data is generated by websites, devices, and communications. Apache Spark is a leading platform that provides scalable and fast stream processing, but still requires smart design to achieve maximum efficiency. This course helps developers use best practices and validated design patterns to implement stream processing in Apache Spark. Instructor Kumaran Ponnambalam shows how to set up your environment and then walks through four design patterns and real-world use cases: streaming analytics, alerts and thresholds, leaderboards, and real-time predictions. In chapter six, he introduces a start-to-finish project that shows how to go from design to executed job using Spark, Apache Kafka, MariaDB, and Redis. By the end of the course, you'll understand all the capabilities of this powerful platform and be able to incorporate it in your own data engineering solutions.
Learning objectives
Streaming opportunities and challenges
Setting up the environment
Steaming analytics with Spark
Monitoring alerts and thresholds with Spark
Creating leaderboards with Spark
Generating real-time predictions with Spark
Hands-on Spark streaming project
Learning objectives
Streaming opportunities and challenges
Setting up the environment
Steaming analytics with Spark
Monitoring alerts and thresholds with Spark
Creating leaderboards with Spark
Generating real-time predictions with Spark
Hands-on Spark streaming project
Skills covered
Apache SparkApacheData EngineeringData ScienceDeep Dive (X:Y)
Concepts
0. Introduction
- 01 - Streaming with Spark
- 02 - Prerequisites
1. Stream Processing with Spark
- 03 - What is stream processing
- 04 - Streaming opportunities and challenges
- 05 - Streaming with Apache Spark
- 06 - Spark Structured Streaming APIs and SQL
- 07 - Setting up the exercise files
- 08 - Setting up Kafka
- 09 - Setting up MariaDB and Redis
2. Streaming Analytics
- 10 - Streaming analytics - Pattern
- 11 - Streaming analytics - Use case design
- 12 - Streaming analytics - Helper classes
- 13 - Streaming analytics - Pipeline implementation
- 14 - Streaming analytics - Results review
3. Alerts and Thresholds
- 15 - Alerts and thresholds - Pattern
- 16 - Alerts and thresholds - Use case design
- 17 - Alerts and thresholds - Helper classes
- 18 - Alerts and thresholds - Pipeline implementation
- 19 - Alerts and thresholds - Review
4. Leaderboards
- 20 - Leaderboards - Pattern
- 21 - Leaderboards - Use case design
- 22 - Leaderboards - Helper classes
- 23 - Leaderboards - Pipeline implementation
- 24 - Leaderboards - Review
5. Real-Time Predictions
- 25 - Real-time predictions - Pattern
- 26 - Real-time predictions - Use case design
- 27 - Real-time predictions - Helper classes
- 28 - Real-time predictions - Pipeline implementation
- 29 - Real-time predictions - Review
6. Use Cases
- 30 - Use case definition
- 31 - Design of the project
- 32 - Code walk-through
- 33 - Execute and analyze
Conclusion
- 34 - Next steps
Related courses
- Scala Essential Training for Data Science
- DataOps with Apache Iceberg using Spark, Nessie, and Dremio
- Cloud Hadoop: Scaling Apache Spark
- Azure Spark Databricks Essential Training
- Big Data Analytics with Hadoop and Apache Spark
- Data Platforms: Spark to Snowflake
- Databricks Certified Data Engineer Associate Cert Prep: 2 ELT with Spark SQL and Python
- Apache Spark Essential Training: Big Data Engineering