Skip to content

DevOps Philosophy

Emory OMOP uses a hybrid framework designed for a small data engineering team shipping research infrastructure. It combines proven practices from several communities rather than adopting any single methodology wholesale.

Why Not Scrum?

Traditional Scrum assumes cross-functional feature teams, predictable sprint capacity, and well-defined user stories. Data engineering work — ETL pipelines, identity resolution, vocabulary mapping, de-identification — doesn't fit these assumptions cleanly. Work is often exploratory, blocked by upstream data issues, and varies dramatically in scope.

We evaluated seven frameworks and adopted the best-fitting elements from each.

The Hybrid Framework

  • DataOps


    Philosophy layer — everything as code, automated quality gates, statistical process control. Borrowed from the DataOps Manifesto and DataKitchen practices.

  • Shape Up


    Cadence layer — 6-week cycles with 2-week cooldowns. Borrowed from Basecamp's Shape Up. Work is "bet on" at the start of a cycle, not groomed in a backlog indefinitely.

  • Kanban


    Daily mechanics — continuous flow with WIP limits. No sprints, no story points. Work moves through a 7-stage board: Inbox → Todo → Ready → In Progress → In Review → Validating → Done.

  • dbt Labs Practices


    Engineering layer — analytics engineering workflows, modular SQL, test-driven development, documentation as code. Aligned with dbt Labs' analytics engineering guide.

How Work Flows

Stage What Happens
Inbox New requests land here — bug reports, feature ideas, research questions
Todo Accepted work, scoped and ready to be picked up
Ready Dependencies resolved, assignee can start immediately
In Progress Actively being worked on
In Review Code review, peer validation
Validating Running against subsamples or production data to verify correctness
Done Shipped and verified

Work is prioritized using hill charts — each item is tracked as either "Figuring Out" (research/design phase) or "Making It Happen" (execution phase). This gives the team honest visibility into whether work is stuck in exploration or actively converging.

Key Principles

  • Thin vertical slices — deliver small, end-to-end increments rather than large horizontal layers. A single slice might add one new table mapping from source through ETL, tests, and documentation.
  • Bet, don't backlog — at the start of each 6-week cycle, the team bets on a small number of high-value items. Work that isn't bet on stays in the shaping queue — it doesn't accumulate as groomed backlog debt.
  • Cooldown weeks — the 2 weeks between cycles are for bug fixes, tooling improvements, documentation, and exploration. No new feature commitments.
  • Automated quality gates — DBT tests, DQD checks, and subsample validation run automatically. Manual QA is reserved for judgment calls, not rote verification.

Data Quality Infrastructure

Quality is built into the pipeline through a tiered testing approach:

Tier Description
1. Unit Test Seeds 25 deterministic patients with known expected outputs
2. Problem-Case Subsample High-churn patients selected for edge case coverage
3. Disease-Group Subsamples Domain-specific cohorts (oncology, cardiology, etc.)
4. Longitudinal Subsample 1-year archived snapshots for regression testing

This is complemented by OHDSI community tools: Achilles for characterization, DQD for automated quality checks, and ARES for data source profiling.

Detailed Documentation

Core team and contributors only

The full framework source of truth, team workflow guide, and observability roadmap are maintained in the emory_omop_enterprise repository. Access is limited to core team members and approved contributors.

Key documents available there:

  • Framework Source of Truth — authoritative reference for the hybrid framework, ceremony schedule, success metrics
  • Team Workflow Guide — practical how-to for working with the GitHub project board, creating issues, and navigating daily rhythms
  • Observability Current State — testing pyramid details, pipeline orchestration, and known gaps
  • Observability Roadmap — planned SPC monitoring, CI gates, and quality enhancements