Data Science Foundations: Data Engineering

Course details

Approach big data with confidence by mastering the core skills needed to put data to work for your business. This course covers the basics of data engineering, system design, analytics, and business intelligence. Data science expert Ben Sullins explains how to collect and organize your data so you can deliver results that your organization can leverage. Ben starts by examining the modern data ecosystem and how it relates to running a smart and efficient data hub. Then, he shows you how to perform the principle tasks involved in managing, loading, extracting, and transforming data. He also takes you through staging, profiling, cleansing, and migrating data. Along the way, he provides actionable recommendations that applicable to data experts throughout an organization—analysts, engineers, scientists, modelers, and more.

Learning objectives
Working with systems and schemas
Managing of a good data pipeline
Setting up an environment
Loading and profiling data
Testing quality
Adding data types
Handling missing values and inferred members
Performing master data lookups
Loading schemas and tables
Creating views

Concepts

0. Introduction

01 - Welcome
02 - What you should know before watching this course
03 - Using the exercise files

1. Ecosystem Overview

04 - Data science system overview
05 - Star schema design overview
06 - Where does data engineering fit
07 - Components of a good data pipeline
08 - Environment setup

2. Staging Data

09 - Loading and profiling data
10 - Data quality testing

3. Cleansing Data

11 - Adding data types
12 - Handling missing values
13 - Verifying addresses

4. Conforming Data

14 - Performing master data lookups
15 - Handling inferred members

5. Delivering Analytical Data Sets

16 - Loading the star schema
17 - Loading dimension tables
18 - Loading fact tables
19 - Creating views

Conclusion

20 - Next steps

Data Science Foundations: Data Engineering

Authors

Ben Sullins