Contributing Vocabularies
Teams with domain expertise can contribute vocabulary mappings directly through the Custom Vocabulary Builder (CVB) — an automated pipeline that turns mapping CSVs into OMOP-compatible vocabulary deltas.
What is CVB?
The Custom Vocabulary Builder is a Docker-based pipeline that:
- Uses the SSSOM (Simple Standard for Sharing Ontological Mappings) format for mapping CSVs
- Validates mappings against the OMOP vocabulary
- Produces concept, concept_relationship, concept_ancestor, and source_to_concept_map deltas
- Packages releases for integration into the Enterprise OMOP build
EU2_Flowsheets is the reference implementation, with 57,000+ mapped source rows across clinical flowsheet data.
Contribution Workflow
flowchart LR
A["Reserve ID range"] --> B["Scaffold project"]
B --> C["Author SQL + CSV"]
C --> D["Run pipeline"]
D --> E["Review + PR"]
- Reserve an ID range in the central registry to avoid collisions with other vocabulary projects
- Scaffold a new vocabulary directory from the CVB template (
new-vocab.sh) - Author mappings — write the source DDL, load script, and mapping CSV
- Run the pipeline — Docker builds concept tables, validates mappings, and produces deltas
- Submit a PR — the team reviews mapping quality and clinical accuracy before merge
Key Mapping Columns
The mapping CSV uses SSSOM-based columns. The most important ones:
| Column | Description |
|---|---|
subject_id |
Source concept ID (from your reserved range) |
subject_label |
Human-readable source term |
predicate_id |
Mapping relationship (e.g., skos:exactMatch) |
object_id |
Target OMOP concept ID |
mapping_justification |
Why this mapping is correct (e.g., semapv:LexicalMatching) |
author_id |
Who created the mapping |
Full column specification
The complete 21-column specification is documented in the CVB repository's CONTRIBUTING.md. Contact the team for repository access.
Quality Validation
All mapping CSVs are validated by validate-mapping-csv.py, which checks:
- Required columns are present and non-empty
- Target concept IDs exist in the OMOP vocabulary
- Subject IDs fall within the project's reserved range
- No duplicate mappings
- Predicate and justification values use controlled vocabularies
CI enforces validation on every pull request — mappings that fail validation cannot be merged.
Getting Access
Repository access required
The CVB repository is hosted on GitHub under the Emory-OMOP organization. Contact the Enterprise OMOP team on the OMOP Enterprise Teams channel to request access and discuss your vocabulary contribution.
Prerequisites
- GitHub access to the Emory-OMOP organization
- Docker installed locally
- Familiarity with CSV editing and basic SQL
- Domain expertise in the clinical area you're mapping