Tips for Writing Queries Against OMOP
Emory's OMOP data lake runs on Amazon Redshift. These tips will help you write efficient, correct queries against the OMOP Common Data Model.
Schema Layout
All standard OMOP CDM tables live in the cdm schema. Common tables you'll query:
| Table | Contains |
|---|---|
cdm.person |
Demographics — one row per patient |
cdm.visit_occurrence |
Encounters (inpatient, outpatient, ER, etc.) |
cdm.condition_occurrence |
Diagnoses |
cdm.drug_exposure |
Medications |
cdm.procedure_occurrence |
Procedures |
cdm.measurement |
Labs, vitals, and other measurements |
cdm.observation |
Other clinical observations |
cdm.concept |
Vocabulary lookup — maps concept_id to human-readable names |
Key Patterns
Always Join to concept for Readable Output
Raw OMOP tables store integer concept_id values. Join to cdm.concept to get names:
SELECT
p.person_id,
c.concept_name AS condition_name,
co.condition_start_date
FROM cdm.condition_occurrence co
JOIN cdm.person p ON co.person_id = p.person_id
JOIN cdm.concept c ON co.condition_concept_id = c.concept_id
WHERE c.concept_name ILIKE '%diabetes%'
LIMIT 100;
Use Standard Concepts
Filter on standard_concept = 'S' when searching the vocabulary to get the canonical OMOP concept:
SELECT concept_id, concept_name, vocabulary_id
FROM cdm.concept
WHERE concept_name ILIKE '%hemoglobin a1c%'
AND standard_concept = 'S'
AND domain_id = 'Measurement';
Concept Hierarchies with concept_ancestor
Find all descendants of a high-level concept to capture related conditions or drugs:
SELECT DISTINCT descendant_concept_id
FROM cdm.concept_ancestor
WHERE ancestor_concept_id = 201826 -- Type 2 diabetes
;
Redshift-Specific Tips
- Use
ILIKEfor case-insensitive text searches (Redshift-specific, not standard SQL) - Avoid
SELECT *— Redshift is columnar; selecting only needed columns is significantly faster - Use
LIMITduring exploration to avoid pulling millions of rows - Date functions — use
DATEDIFF(day, start_date, end_date)for date arithmetic