Designing Big Data Healthcare Studies, Part One
2h 16mAdvanced2025-02-04
Authors

Monika Wahi
Data Science and Biotech Expert
Course details
Even if you have a strong grasp of statistics and informatics, you also need to understand epidemiology and basic study design to perform accurate, rigorous analysis of healthcare data. This course will help you design research studies around hypotheses, and fill the knowledge gap that many of today's analysts face when entering the healthcare field. Instructor Monika Wahi defines basic terms and concepts in epidemiology, and reviews the different study design approaches: descriptive, analytic, cross-sectional, and case control. She dives into detail on cross-sectional and case-control studies, and shows how to plan an analytic data set: figuring out the necessary native variables and operationalizing them in a data dictionary. Last, she reviews the lessons learned from the course and prepares you for part two of the training series, which tackles the descriptive and regression analysis for the data set you have designed.
Learning objectives
Define exposure and outcome.
Explore the elements of populations versus samples in a data healthcare study.
Recognize the fundamentals of utilizing the scientific method in epidemiology.
Explore the essential elements of part one of the Bradford Hill criteria.
Recognize the fundamentals of an observational study versus an experiment.
Define a case-control study design.
Identify the important parts of levels of evidence.
Recall the meaning of a prevalence rate.
Plan the best ways to establish a working hypothesis.
Learning objectives
Define exposure and outcome.
Explore the elements of populations versus samples in a data healthcare study.
Recognize the fundamentals of utilizing the scientific method in epidemiology.
Explore the essential elements of part one of the Bradford Hill criteria.
Recognize the fundamentals of an observational study versus an experiment.
Define a case-control study design.
Identify the important parts of levels of evidence.
Recall the meaning of a prevalence rate.
Plan the best ways to establish a working hypothesis.
Skills covered
RData EngineeringData AnalysisData ScienceBusiness Analysis and StrategyBusiness Software and ToolsOpen SourceDeep Dive (X:Y)
Concepts
0. Introduction
- 01 - Welcome
- 02 - What you should know
- 03 - Using the exercise files
1. Epidemiology and Causal Inference
- 04 - Definition of epidemiology
- 05 - Terms about data
- 06 - Definition of exposure and outcome
- 07 - Populations vs. samples
- 08 - Scientific method in epidemiology
- 09 - Bradford Hill criteria - Part one
- 10 - Bradford Hill criteria - Part two
2. Study Designs
- 11 - Overview of human research
- 12 - Observational study vs. experiment
- 13 - Descriptive vs. analytic study designs
- 14 - Cross-sectional study design
- 15 - Case-control study design
- 16 - Levels of evidence
3. Measures of Association
- 17 - Introduction to the 2x2 table
- 18 - Prevalence ratio
- 19 - Odds ratio in a cross-sectional study
- 20 - Odds Ratio in a case-control study
- 21 - Conclusion about the 2x2 table
4. Planning a Study
- 22 - Definition of confounders
- 23 - Using a web of causation to identify confounders
- 24 - Tools for reviewing the scientific literature
- 25 - Reviewing existing scientific literature
- 26 - Establishing a working hypothesis
- 27 - Choosing a dataset
- 28 - Final dataset considerations
5. Planning the Analytic Dataset
- 29 - Definition of data curation
- 30 - Requirements for a cross-sectional or case-control analytic dataset
- 31 - Setting up a data dictionary
- 32 - Operationalizing the subpopulation
- 33 - Operationalizing the exposure, outcome, and confounders
- 34 - Documenting transformed variables in the data dictionary
Conclusion
- 35 - Review of the course
- 36 - Preparation for part two
Related courses
- Data Science Reporting with Quarto for Python
- Data Visualization in R with ggplot2
- Data Wrangling in R
- Cleaning Bad Data in R
- Designing Big Data Healthcare Studies, Part Two
- Algorithmic Trading and Finance Models with Python, R, and Stata Essential Training
- R Tidyverse Applications
- SQL Server Machine Learning Services: R