DevOps Foundations: Site Reliability Engineering
1h 20mAdvanced2022-04-14
Authors

Ernest Mueller
Director of Engineering at Six Nines IT

James Wickett
Security Engineer and supporter of rugged software and DevSecOps
Course details
Site reliability engineering (SRE) is an emerging paradigm in DevOps. The biggest names in tech—companies like Google, Netflix, Microsoft, and LinkedIn—all use SRE. In fact, industry wide, "site reliability engineer" is replacing "DevOps engineer" in job posts. Simply put, SRE is software engineering applied to operations—for the cloud native era. This course introduces the basics of site reliability engineering, including how SRE fits into DevOps and how it can be integrated into your unique business environment. Instructors Ernest Mueller and James Wickett cover the major areas of expertise, including release engineering, change management, incident management and retrospectives, self-service automation, troubleshooting, performance, and deliberate adversity. Learn how to define reliability through SLAs and SLOs, handle crisis, design distributed systems, and scale your systems and your team. Plus, explore time and project management strategies that bring humanity back to the SRE's job.
Learning objectives
Site reliability engineering basics
Release engineering
Change management
Incident management
Postmortems
Troubleshooting
Distributed design
Organization
Learning objectives
Site reliability engineering basics
Release engineering
Change management
Incident management
Postmortems
Troubleshooting
Distributed design
Organization
Skills covered
DevOps FoundationsServer AdministrationDevOpsFoundationsNetwork and System Administration
Concepts
0. Introduction
- 01 - Reliability engineering basics
- 02 - What you should know
1. SRE Basics
- 03 - Your job as a DevOp
- 04 - You aren't Google or Netflix
2. SRE Practice Areas
- 05 - Release engineering
- 06 - Change management
- 07 - Self-service automation
- 08 - SLAs and SLOs
- 09 - Incident management
- 10 - Introducing postmortems
- 11 - The postmortem process
- 12 - Troubleshooting
- 13 - Performance engineering
- 14 - Capacity and scalability
- 15 - Distributed design
- 16 - Deliberate adversity
3. SRE Organization
- 17 - Organizing SREs
- 18 - The softer side of SRE
Conclusion
- 19 - Next steps
Related courses
- AWS Certified DevOps Engineer Professional (DOP-C02) Cert Prep
- DevOps Foundations: DevSecOps
- Test-Driven Development in an AI World
- Building a Multicloud Security Program: Strategy, Implementation, and Emerging Trends
- Microservices Security Workshop: From Build to Production
- Improve Meetings Using Lean Thinking
- Introduction to Azure Bicep: Creating and Deploying Resources
- Advanced Threat Modeling and Risk Assessment in DevSecOps