Architecture
DRAFT — Internal Review Only · Not for distribution in wide release
This document is a working draft intended for internal review by the Enterprise OMOP implementation team and our growing network of collaborators. It has not been finalized, peer-reviewed, or approved for external distribution. Schema details, table names, and implementation guidance are subject to change.
See releases and roadmap for details.
Authors: Emory University — Enterprise OMOP Implementation Team
Version: 0.1 | Date: March 17, 2026
New to NLP in OMOP?
Start with Notes and NLP Design for a general overview of the problem this architecture is designed to solve.
A Span-Based NLP Infrastructure for OMOP CDM
The OMOP CDM includes the note_nlp table for storing structured outputs from NLP applied to clinical notes. However, this flat, single-table design presents significant limitations for modern NLP workflows: it lacks pipeline provenance, cannot represent span-level annotations with relationships, and conflates NLP-derived data with discrete clinical observations.
We propose an extended schema architecture that addresses these gaps through four layers:
- NLP Process Metadata for full pipeline provenance
- Span-based annotation model for rich NLP output representation
- Modifier-based intermediate translation layer
_DERIVEDsuffix tables that maintain clear separation between ETL-sourced and NLP-derived clinical data
This design draws on work by the IOMED NLP team, the OHDSI NLP Working Group, and includes Emory-specific extensions for operational deployment.
The Problem
Clinical notes contain diagnoses, medications, temporal relationships, lab values, and social determinants that never reach the structured tables of the OMOP CDM. NLP can extract this information, but the current note_nlp table was designed as a minimal landing zone.
Limitations of note_nlp
- No pipeline provenance
- When multiple NLP systems process the same notes, there is no standard way to record which system produced which result, at which version, on which date.
- Flat annotation model
- Modern NLP produces span-level annotations with probabilities, typed extractions (dates, quantities), and inter-span relationships. The
note_nlptable cannot represent these structures. - Conflation of data provenance
- If NLP-extracted lab values land in the same
measurementtable as discrete EHR data, researchers cannot distinguish high-confidence discrete values from probabilistic NLP extractions without complex filtering logic.
Architecture Overview
The architecture flows through four layers, from the OMOP note table to _DERIVED CDM tables:
OMOP note NLP Model Process Metadata NLP Output + _DERIVED
table (train/infer) Creation Intermediate OMOP CDM Tables
─────────── ─────────────── ─────────────────── ─────────────────── ───────────────────
nlp_system note_span measurement_DERIVED
note ───▶ Model ────────▶ pipeline note_span_concept death_DERIVED
Train Component nlp_date condition_DERIVED
pipeline_component nlp_quantity drug_DERIVED
nlp_execution note_span_assert. procedure_DERIVED
note_span_execution note_nlp_modifier observation_DERIVED
Key principle
Data is not forwarded from the NLP process metadata tables to the OMOP metadata tables. The process metadata layer is an operational concern; the output metadata layer produces the clinical data that lands in _DERIVED tables.
Layer 1: NLP Process Metadata Creation
These tables describe the NLP infrastructure itself. They are not forwarded to OMOP metadata tables — they exist to support pipeline management, reproducibility, and auditability.
nlp_system — NLP system registration
The top-level registration of an NLP system (e.g., "MedSpaCy v1.2", "Emory Clinical NER v3").
| Column | Type | Description |
|---|---|---|
nlp_system_id |
int (PK) | Unique identifier for the NLP system |
name |
varchar | Human-readable name of the system |
version |
int | Version number |
source |
varchar | Source platform or registry (e.g., github, huggingface, pypi, docker, s3) |
source_uri |
varchar | Full addressable location (e.g., github.com/EmoryDataSolutions/emory_omop_internal_nlp) |
source_version |
varchar | Machine-verifiable version (git SHA, HF commit, Docker tag, PyPI version) |
pipeline — Pipeline configuration
A specific pipeline configuration within an NLP system. A single system may expose multiple pipelines (e.g., "medication_extraction", "negation_detection").
| Column | Type | Description |
|---|---|---|
pipeline_id |
int (PK) | Unique identifier for the pipeline |
pipeline_name |
varchar | Descriptive name |
programming_language |
varchar | Implementation language (e.g., Python, Java) |
active |
varchar | Whether the pipeline is currently active |
source |
varchar | Source platform or registry (e.g., github, huggingface, pypi, docker, s3) |
source_uri |
varchar | Full addressable location (e.g., github.com/EmoryDataSolutions/emory_omop_internal_nlp) |
source_version |
varchar | Machine-verifiable version (git SHA, HF commit, Docker tag, PyPI version) |
Component — Pipeline components
An individual processing step within a pipeline (e.g., a tokenizer, a named entity recognizer, a concept linker).
| Column | Type | Description |
|---|---|---|
component_id |
int (PK) | Unique identifier for the component |
component_name |
varchar | Name of the component |
version |
varchar | Component version string |
data |
varchar | Reference to model artifacts or data dependencies |
is_pipe |
varchar | Whether this component is itself a sub-pipeline |
source |
varchar | Source platform or registry (e.g., github, huggingface, pypi, docker, s3) |
source_uri |
varchar | Full addressable location (e.g., github.com/EmoryDataSolutions/emory_omop_internal_nlp) |
source_version |
varchar | Machine-verifiable version (git SHA, HF commit, Docker tag, PyPI version) |
Source provenance
Source provenance applies at all three levels because systems, pipelines, and components may originate from different repositories or registries. For example, an NLP system may live in a GitHub repo, while one of its components is a HuggingFace model and another is a PyPI package. The source / source_uri / source_version triple provides machine-verifiable provenance regardless of where the artifact lives.
| Level | source | source_uri | source_version |
|---|---|---|---|
| nlp_system | github | github.com/EmoryDataSolutions/emory_omop_internal_nlp | 83b3464 |
| component (medspacy) | pypi | medspacy | 1.2.0 |
| component (BERT) | huggingface | emilyalsentzer/Bio_ClinicalBERT | a2ab630 |
pipeline_component — Pipeline-component linkage
Junction table linking pipelines to their constituent components, with per-pipeline configuration.
| Column | Type | Description |
|---|---|---|
pipeline_id |
int (FK) | Pipeline this component belongs to |
component_id |
int (FK) | The component |
config |
varchar | JSON or serialized configuration for this component within this pipeline |
num_instances |
int | Number of parallel instances (for distributed execution) |
nlp_execution — Execution records
A record of a specific pipeline run — when it happened, which system and pipeline were used, and the worker version.
| Column | Type | Description |
|---|---|---|
nlp_execution_id |
int (PK) | Unique identifier for this execution |
nlp_system_id |
int (FK) | The system that ran |
pipeline_id |
int (FK) | The pipeline configuration used |
worker_version |
int | Version of the execution worker/runtime |
nlp_date |
date | Date the execution occurred |
note_span_execution — Span-to-execution linkage
Junction table linking individual note spans (outputs) back to the execution that produced them.
| Column | Type | Description |
|---|---|---|
note_span_id |
int (FK) | The output span |
nlp_execution_id |
int (FK) | The execution that produced it |
Layer 2: Note NLP Minimal Required Metadata
These tables store the actual NLP outputs. Some metadata from this layer is pushed to OMOP metadata tables for discoverability. This layer adopts the OHDSI NLP Working Group's span-based model.
note_span — Core annotation table
Each row represents a contiguous span of text identified by the NLP system, with character-level offsets into the source note.
| Column | Type | Description |
|---|---|---|
note_span_id |
int (PK) | Unique identifier for the span |
source_primary_key_source |
varchar | Fully qualified source path identifying the source system and column (e.g., clarity.dbo.hno_notes.note_csn_id) |
source_primary_key |
varchar | The note identifier in that source system |
span_start |
int | Character offset of span start (0-indexed) |
span_end |
int | Character offset of span end (exclusive) |
span_text |
varchar | The extracted text |
probability |
double | Confidence score from the NLP model |
NLP practitioners submit spans referencing notes by source system identifiers, not OMOP note_id. Resolution to OMOP IDs happens downstream via note_mapping, following the same _source_primary_key_source + _source_primary_key pattern used across the Enterprise OMOP pipeline (e.g., visit_occurrence_mapping, care_site_mapping).
Character offsets enable precise text localization, which is critical for annotation review, model debugging, and adjudication workflows.
note_span_concept — Concept mappings
Maps a span to one or more OMOP concepts. Supports both a standardized concept_id and a source_concept_id for the raw vocabulary mapping before standardization.
| Column | Type | Description |
|---|---|---|
note_span_id |
int (FK) | The annotated span |
concept_id |
int (FK) | Standard OMOP concept |
source_concept_id |
int (FK) | Source vocabulary concept before mapping |
nlp_date — Temporal extractions
Typed extraction for temporal expressions found in text (e.g., "last Tuesday", "March 2024").
| Column | Type | Description |
|---|---|---|
nlp_date_id |
int (PK) | Unique identifier |
note_span_id |
int (FK) | The span containing the date expression |
date_precision_concept_id |
int (FK) | Precision level (day, month, year) |
nlp_date_time |
datetime | Resolved date/time value |
nlp_quantity — Numeric extractions
Typed extraction for numeric values with units (e.g., "BP 120/80 mmHg", "weight 72 kg").
| Column | Type | Description |
|---|---|---|
nlp_quantity_id |
int (PK) | Unique identifier |
note_span_id |
int (FK) | The span containing the quantity |
base_units_concept_id |
int (FK) | OMOP concept for the unit of measurement |
magnitude_base_units |
double | The extracted numeric value |
magnitude_base_units_range_low |
double | Lower bound (for range expressions) |
magnitude_base_units_range_high |
double | Upper bound (for range expressions) |
note_span_assertion — Contextual assertions and inter-span relationships
Captures contextual assertions about spans — negation, experiencer, temporality, certainty — as well as binary inter-span relationships (e.g., a medication span "metformin" related to a dosage span "500mg").
| Column | Type | Description |
|---|---|---|
source_note_span_id |
int (FK) | The source span |
target_note_span_id |
int (FK) | The target span; NULL for unary assertions (e.g., negation of a single span) |
label_concept_id |
int (FK) | OMOP concept describing the assertion type (negation, experiencer, temporality, certainty) or relationship type |
probability |
double | Confidence score for the assertion or relationship |
note_mapping — Source-to-OMOP note resolution
Maps source system note identifiers to OMOP note_id. This is the bridge that allows note_span rows — which reference notes by source system keys — to be resolved to OMOP note table IDs downstream in dbt. Analogous to visit_occurrence_mapping and care_site_mapping in the existing Enterprise OMOP pipeline.
| Column | Type | Description |
|---|---|---|
note_id |
int (FK) | OMOP note table ID |
source_primary_key_source |
varchar | Source system identifier (matches note_span.source_primary_key_source) |
source_primary_key |
varchar | Note ID in source system (matches note_span.source_primary_key) |
This table is populated by the ETL pipeline (not the NLP service) and is maintained as a reference for downstream dbt transformations.
Terminology: Why note_span_assertion
The term assertion is the standard designation in clinical NLP for contextual attributes of extracted entity spans. It originates from the i2b2/VA 2010 shared task on "concepts, assertions, and relations in clinical text" (Uzuner et al., 2011), which established the canonical annotation categories: present/absent (negation), experiencer (patient vs. family), and hypothetical/conditional (certainty). This terminology was continued by the n2c2 Track 1 context classification tasks.
The ConText algorithm (Harkema et al., 2009) is the foundational method for detecting these assertions automatically. Its successors — NegEx, pyConText, and medspaCy's ConTextComponent — remain the dominant approach in production clinical NLP systems.
The standard assertion categories are:
- Negation — whether a finding is present or absent (e.g., "denies chest pain")
- Experiencer — whether the finding pertains to the patient or someone else (e.g., "family history of diabetes")
- Temporality — whether the finding is current, historical, or hypothetical (e.g., "prior MI")
- Certainty — the confidence level: definite, possible, conditional (e.g., "possible pneumonia")
| Term | Used by | Scope |
|---|---|---|
| Assertion | i2b2, n2c2, academic publications | The annotation task and the concept itself |
| Context | ConText algorithm (Harkema et al.) | The algorithm that detects assertions |
| Modifiers | medspaCy, cTAKES, OMOP term_modifiers |
Implementation-level term |
This table was previously named note_span_relationship. The rename to note_span_assertion reflects its primary role: capturing unary assertions about spans (negation, temporality, experiencer, certainty) via label_concept_id, with target_note_span_id = NULL for unary assertions. The table can also capture binary inter-span relationships when target_note_span_id is populated — but the dominant use case is assertion, and the name should reflect that.
Layer 3: Intermediate Translation
note_nlp_modifier
This table serves as the bridge between the span-based NLP output model and the OMOP CDM domain tables. Each row captures a single modifier attribute extracted from or derived for a span, in a format amenable to CDM mapping.
Field reference (click to expand)
| Column | Type | Description |
|---|---|---|
note_nlp_modifier_id |
int (PK) | Unique identifier |
note_span_id |
int (FK) | The source span |
note_nlp_modifier_field_concept_id |
int (FK) | The target CDM field this modifier populates (e.g., condition_start_date, condition_concept_id) |
note_nlp_modifier_date |
date | Date value, if the modifier is temporal |
note_nlp_modifier_string |
varchar | String value, if the modifier is textual |
note_nlp_modifier_concept_id |
int (FK) | Concept value, if the modifier maps to a standard concept |
note_nlp_modifier_number |
int | Numeric value, if the modifier is quantitative |
nlp_execution_id |
int (FK) | Provenance link to the execution that produced this modifier |
Why an intermediate step?
The span model (Layer 2) is optimized for NLP output fidelity — preserving exactly what the model found, where, and with what confidence. The CDM domain tables (Layer 4) are optimized for clinical analytics — person-level observations with dates and standard concepts.
The note_nlp_modifier table provides the structured, typed attribute representation needed to map between these two paradigms. It decomposes complex span annotations into individual modifier fields that can be systematically transformed into CDM rows.
Layer 4: OMOP CDM _DERIVED Tables
NLP-extracted observations land in tables suffixed with _DERIVED, which mirror the schema of their standard CDM counterparts:
| DERIVED Table | Standard CDM Counterpart | Example Use Case |
|---|---|---|
measurement_DERIVED |
measurement |
Lab values extracted from notes (e.g., "A1c 7.2%") |
death_DERIVED |
death |
Death dates extracted from discharge summaries |
condition_DERIVED |
condition_occurrence |
Diagnoses mentioned in clinical notes |
drug_DERIVED |
drug_exposure |
Medication mentions extracted from notes |
procedure_DERIVED |
procedure_occurrence |
Procedures referenced in operative notes |
observation_DERIVED |
observation |
Social determinants, symptoms, and other observations |
Design rationale
Ensure clear separation between the clinical tables, and the derived elements from NLP for those tables. Coupled with the upstream linkage to NLP metadata, researchers can choose which NLP-derived data is acceptable to join to the standard clinical table.
measurementholds discrete lab values from the EHR (high confidence, structured source)measurement_DERIVEDholds NLP-extracted lab values (variable confidence, unstructured source)
By populating the upstream NLP metadata about the pipeline, researchers can select specific runs, by specific NLP implementers, filtering by execution date, system version, or confidence threshold before choosing to incorporate derived data into their analyses.
— Enterprise OMOP Implementation Team
Emory Extended Columns
Beyond the standard OMOP 5.4 columns, _DERIVED tables include Emory-specific columns for direct provenance linkage:
| Column | Type | Description |
|---|---|---|
note_span_id |
int (FK) | Direct link to the source span in note_span |
execution_id |
int (FK) | Direct link to the NLP execution in nlp_execution |
_source_primary_key |
varchar | Caret-delimited note_nlp_modifier_id values that produced this row (e.g., 1000^1001) |
_source_primary_key_source |
varchar | Fixed value: note_nlp_modifier |
condition_type_concept_id |
int (FK) | Set to the NLP-Derived concept to mark provenance (applies similarly as *_type_concept_id in other _DERIVED tables) |
Backward Compatibility
The _DERIVED tables include all columns from their standard CDM counterparts, plus the Emory extended columns described above:
- Existing OHDSI tools (ATLAS, CohortDiagnostics, FeatureExtraction) can target
_DERIVEDtables with minimal configuration changes — OHDSI tools ignore unknown columns, so the extra provenance columns do not break compatibility - A
UNION ALLview can combine standard and derived tables when a researcher decides NLP-derived data meets their quality threshold - The standard CDM tables remain untouched — no schema modifications required
Full Provenance Chain
Every NLP-derived observation can be traced back through the full chain:
measurement_DERIVED row
→ note_nlp_modifier (what modifier attributes were applied)
→ note_span (exact text, character offsets, confidence)
→ note_mapping (resolves source_primary_key → OMOP note_id)
→ note (the original clinical text)
→ note_span_execution (which execution produced this span)
→ nlp_execution (when, which pipeline)
→ pipeline + pipeline_component (which components, what config)
→ nlp_system (which system, what version)
| Layer | Concern | Audience |
|---|---|---|
| Process Metadata | What NLP infrastructure ran | NLP engineers, MLOps |
| Output Metadata | What was extracted from text | NLP researchers, annotators |
| Intermediate | How to map spans to CDM | ETL developers |
_DERIVED Tables |
Clinical analytics on NLP data | Researchers, clinicians |
Comparison with Standard note_nlp
| Capability | Standard note_nlp |
Proposed Architecture |
|---|---|---|
| Pipeline provenance | Single nlp_system string field |
Full normalized hierarchy (system, pipeline, component, execution) |
| Span representation | offset + snippet (substring) |
span_start / span_end character offsets + span_text |
| Confidence scores | Not supported | probability on spans and assertions |
| Typed extractions | Not supported | Dedicated nlp_date, nlp_quantity tables |
| Contextual assertions and inter-span relationships | Not supported | note_span_assertion with typed labels |
| Separation from discrete data | NLP data lands in standard CDM tables | _DERIVED suffix tables with explicit provenance |
| Modifier representation | Free-text term_modifiers string |
Structured note_nlp_modifier with typed fields |
Implementation Guidance
Pipeline Registration
Before any NLP processing occurs, register the infrastructure:
- Register the NLP system in
nlp_systemwith a unique name and version - Register components in
Component— each tokenizer, model, post-processor gets an entry - Define pipelines in
pipelineand link components viapipeline_component, including per-component configuration
This registration is a one-time setup per pipeline version. When a component is updated (e.g., a new model checkpoint), increment the component version and create a new pipeline entry.
Execution and Annotation Flow
For each NLP run:
- Create an
nlp_executionrecord capturing the system, pipeline, worker version, and date - Process notes through the pipeline, producing span-level annotations
- Write
note_spanrows withsource_primary_key_sourceandsource_primary_keyidentifying the source note, along with character offsets, extracted text, and confidence scores - Write typed extractions to
note_span_concept,nlp_date,nlp_quantity, andnote_span_assertionas appropriate - Link spans to executions via
note_span_execution
Source key pattern
The bronze ingestion layer (FastAPI service) accepts source system identifiers rather than OMOP note_id values. NLP practitioners reference notes using source_primary_key_source (the fully qualified source path, e.g., clarity.dbo.hno_notes.note_csn_id) and source_primary_key (the value in that source column). Resolution from source keys to OMOP note_id is handled downstream by dbt via the note_mapping table. This follows the existing Emory _source_primary_key_source + _source_primary_key pattern used for visit_occurrence_mapping, care_site_mapping, and other identity resolution tables in the Enterprise OMOP pipeline.
Translation to _DERIVED Tables
The translation from span-based output to CDM-compatible rows is a distinct ETL step:
- Filter negated spans — spans with a negation assertion in
note_span_assertiondo not produce modifier rows. They are retained in Layer 2 for auditability but excluded from downstream translation - Generate
note_nlp_modifierrows from non-negated span annotations — decomposing each span into its modifier attributes (concept, date, string, number) - Collapse modifiers to
_DERIVEDrows — multiple modifiers for the same span + execution merge into a single_DERIVEDrow (e.g., a condition span with separate date and concept modifiers produces onecondition_DERIVEDrow) - Map modifiers to CDM domain tables — for each span with a mapped concept, determine the target domain (measurement, condition, drug, etc.) based on the concept's
domain_idin the OMOP vocabulary - Write
_DERIVEDrows with appropriate linkage back tonote_nlp_modifierfor provenance
Confidence Thresholds
Recommended thresholds (calibrate per system)
| Tier | Probability Range | Recommendation |
|---|---|---|
| High confidence | ≥ 0.90 | Auto-populate _DERIVED tables |
| Medium confidence | 0.70 – 0.89 | Populate with flag; available for research opt-in |
| Low confidence | < 0.70 | Retain in note_span only; do not translate to _DERIVED |
Schema Deployment
| Zone | Tables | Schema Recommendation |
|---|---|---|
| NLP Operations | nlp_system, pipeline, Component, pipeline_component, nlp_execution, note_span_execution | Dedicated nlp_ops schema |
| NLP Output | note_span, note_span_concept, nlp_date, nlp_quantity, note_span_assertion, note_nlp_modifier | Dedicated nlp_output schema |
| CDM Extended | measurement_DERIVED, death_DERIVED, condition_DERIVED, drug_DERIVED, procedure_DERIVED, observation_DERIVED | Same schema as CDM (or cdm_derived schema) |
Research Patterns
| Question | How to Query |
|---|---|
| NLP-extracted A1c values from a specific pipeline | Join measurement_DERIVED through note_nlp_modifier → note_span → note_span_execution → nlp_execution → nlp_system, filter by system name and version |
| All medication mentions with confidence > 0.85 | Query note_span joined to note_span_concept where probability >= 0.85 and concept is in Drug domain |
| Compare NLP-derived conditions to discrete diagnoses | UNION ALL of condition_occurrence and condition_DERIVED, with a source flag column |
| Audit which NLP pipeline produced a specific finding | Trace note_span_execution → nlp_execution → pipeline → pipeline_component → Component |
Attribution
This architecture integrates work from three sources:
- IOMED NLP — NLP process metadata tables (
nlp_system,pipeline,Component,pipeline_component,nlp_execution,note_span_execution) - OHDSI NLP Working Group — Span-based annotation model and typed extraction tables (
note_span,note_span_concept,nlp_date,nlp_quantity,note_span_assertion) - Emory Enterprise OMOP Team —
_DERIVEDtable pattern,note_nlp_modifierintermediate translation layer, operational deployment guidance, and column modifications
References
- Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. Journal of the American Medical Informatics Association. 2011;18(5):552–556. PMC3168320
- Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. Journal of Biomedical Informatics. 2009;42(5):839–851. PMC2757457
- n2c2 NLP Clinical Challenges — successor to i2b2, continuing shared tasks on clinical NLP including context classification. n2c2.dbmi.hms.harvard.edu
Future Directions
Annotation Review and Validation Framework
NLP pipeline output is high-recall, variable-precision. Every extraction is a candidate, not a fact. The architecture must support human review, annotation adjudication, and gold-standard corpus management to close the quality loop.
The Brain Health NLP pilot (v1.1) demonstrated this directly: a "family" assertion on an EEG note was flagged by the ConText algorithm, but human review using the note_span_snippet context window revealed a false positive — the patient's seizure activity interfered with interactions with a family member present during monitoring, not a family history of the condition.
Proposed Schema: nlp_annotation_review
A review table capturing human judgments on NLP-extracted spans:
| Column | Type | Description |
|---|---|---|
review_id |
int (PK) | Unique identifier for the review |
note_span_id |
int (FK) | The span being reviewed |
nlp_execution_id |
int (FK) | The execution that produced the span (ties review to a specific pipeline version) |
reviewer_id |
varchar | Identifier for the human reviewer |
reviewer_role |
varchar | Role of the reviewer (e.g., "clinician", "annotator", "domain_expert") |
review_date |
timestamptz | When the review was performed |
original_label |
varchar | The NLP pipeline's concept_source_value for the span |
original_assertion |
varchar | The NLP pipeline's assertion (e.g., "negation", "possible", "family") |
reviewer_judgment |
varchar | Human verdict: correct, incorrect, uncertain, partial |
corrected_assertion |
varchar | If incorrect, what the assertion should be (e.g., "family" → NULL for affirmed) |
corrected_label |
varchar | If incorrect, what the entity label should be |
review_notes |
varchar | Free-text annotation from the reviewer |
Design Principles
-
Non-destructive: Reviews do not modify NLP output. The original spans, assertions, and
_derivedrows remain unchanged. The review table is an overlay that records human judgments alongside machine output. -
Execution-linked: Each review references an
nlp_execution_id, so when a pipeline is re-run with updated rules, prior reviews can be compared against new output to measure improvement. -
Multi-annotator support: Multiple reviewers can review the same span. Inter-annotator agreement can be computed by grouping reviews by
note_span_idand comparingreviewer_judgmentacross reviewers. -
Gold-standard corpus: Spans with
reviewer_judgment = 'correct'or reviewer-corrected labels form a gold-standard corpus for future model evaluation. This corpus grows organically as researchers validate_derivedtable output.
Workflow
Researcher queries _derived table
→ Finds unexpected or questionable result
→ Joins to note_span_snippet for context window
→ Records judgment in nlp_annotation_review
→ Aggregated reviews inform pipeline rule improvements
→ Reviewed spans become gold-standard evaluation data
Validation Metrics
Once nlp_annotation_review is populated, standard NLP evaluation metrics can be computed per entity category, per assertion type, and per pipeline version:
| Metric | Computed from |
|---|---|
| Precision | correct / (correct + incorrect) per category |
| Recall | Requires a separate gold-standard annotation pass (entities the NLP missed) |
| F1 | Harmonic mean of precision and recall |
| Inter-annotator agreement | Cohen's kappa or Fleiss' kappa across reviewer pairs |
| Assertion accuracy | correct assertion / total reviewed per assertion type |
These metrics link back to the NLP process metadata (Layer 1) via nlp_execution_id, enabling longitudinal tracking of pipeline quality across versions.
Model Performance Metadata
Linking execution records to evaluation metrics (precision, recall, F1) from validation runs. This extends nlp_annotation_review with aggregate performance tables:
| Column | Type | Description |
|---|---|---|
evaluation_id |
int (PK) | Unique identifier |
nlp_execution_id |
int (FK) | The execution being evaluated |
entity_category |
varchar | The category evaluated (e.g., "cerebral_atrophy") |
assertion_type |
varchar | The assertion type evaluated (e.g., "negation", "possible") |
precision |
double | Precision score |
recall |
double | Recall score (if gold-standard available) |
f1 |
double | F1 score |
n_reviewed |
int | Number of spans reviewed |
evaluation_date |
timestamptz | When the evaluation was computed |
Additional Future Directions
- OMOP CDM proposal — Submitting the
_DERIVEDtable pattern and span-based model as a formal OMOP CDM extension - Cross-site federation — Validating the schema for federated NLP analyses across OHDSI network sites
Related Pages
- NLP Glossary — Terminology reference for clinical NLP concepts
- Note NLP (OMOP Primer) — Standard
note_nlptable reference - Data Mapping — How source data flows into OMOP
- Data Quality Design — DataOps framework for pipeline quality