Notes and NLP Design

Back to Data in Enterprise OMOP

80% of clinical data never reaches your database.

It lives in physician notes, discharge summaries, and radiology reports.
NLP can extract it — but can you trust what it found?

The promise

NLP turns free text into structured clinical data

A discharge summary reads: "Patient presents with worsening shortness of breath, elevated troponin at 0.42 ng/mL, and a history of poorly controlled type 2 diabetes."

An NLP pipeline reads that sentence and extracts three structured facts: a condition (dyspnea), a measurement (troponin 0.42 ng/mL), and a comorbidity (type 2 diabetes). These facts flow into research databases, power cohort definitions, and inform clinical decision support.

The field has advanced rapidly. Rule-based systems, statistical NER, transformer models, and now large language models can all perform clinical extraction with increasing accuracy. Institutions across the OHDSI network are deploying NLP pipelines against millions of clinical notes.

But there is a problem growing quietly beneath the surface.

The reporting crisis

Most NLP studies don't tell you how they work

In 2023, Sunyang Fu and colleagues at Mayo Clinic published a landmark scoping review examining how NLP-assisted observational studies report their methods. They reviewed 50 studies published between 2009 and 2021.

What they found was alarming.

58%

failed to report
model definitions

74%

omitted normalization
techniques

14%

provided no NLP
evaluation at all

Only 12% of studies reported all three essential definitions: the model used, the normalization vocabulary, and the context parameters (negation, temporality, experiencer). The rest left readers with no way to reproduce, validate, or even understand the NLP that generated the clinical data they were analyzing.

"The absence of detailed reporting guidelines may create ambiguity in the use of NLP-derived content, knowledge gaps in the current research reporting practices, and reproducibility challenges."

— Fu et al., Clinical and Translational Science, 2023

Read the full paper · See Figure 6: Reporting gap breakdown

This isn't a minor documentation gap. When a researcher builds a cohort using NLP-extracted conditions, they are making clinical decisions based on outputs from a system whose configuration, training data, confidence thresholds, and versioning are invisible.

The LLM era

Large language models made the problem bigger, not smaller

The arrival of GPT-4, Claude, and domain-specific clinical LLMs has supercharged NLP capabilities. Extraction tasks that once required custom-trained models now work with a prompt. The barrier to deploying NLP against clinical notes has never been lower.

But the reporting problem has gotten worse. When the model is a black-box API, the provenance chain collapses entirely. What prompt was used? What version of the model? What temperature? Was the output post-processed? How was the concept mapped? These details rarely survive past the developer who wrote the script.

The community has noticed. In January 2025, TRIPOD-LLM was published in Nature Medicine — a 19-item, 50-subitem checklist specifically for reporting studies that use large language models in healthcare. A month later, FUTURE-AI appeared in The BMJ — a 30-recommendation lifecycle framework for trustworthy AI built by 117 experts from 50 countries, organized around six principles: Fairness, Universality, Traceability, Usability, Robustness, and Explainability.

These are important steps. But checklists and frameworks solve the publication and governance problem, not the production problem. A TRIPOD-LLM–compliant paper and a FUTURE-AI–aligned development process still don't help a downstream researcher who joins your NLP-extracted conditions to their cohort six months later and needs to know: which pipeline, which version, which confidence threshold, which execution date?

The trust question

If you can't trace it, you can't trust it

Imagine two rows in a condition_occurrence table. Both say the patient has type 2 diabetes. One came from an ICD-10 code entered by a physician during an encounter. The other was extracted from a radiology report by an NLP pipeline you've never heard of, running a model version that may no longer exist, with a confidence score that was rounded before storage.

These two rows look identical. They sit in the same table, share the same schema, and will both be included in your cohort query. But they have fundamentally different provenance, fundamentally different confidence levels, and fundamentally different implications for your research.

This is the core problem: NLP-derived data is structurally indistinguishable from discrete clinical data once it lands in the CDM. And the metadata needed to distinguish them — the pipeline, the model, the execution, the confidence — has no standard place to live.

The question isn't whether NLP works. It's whether the people who consume NLP outputs have the metadata they need to decide if those outputs are appropriate for their specific research question.

This is why NLP methods centralization matters. Not centralization of which model you use — use any model, any framework, any LLM. But centralization of how you record what happened. Open, structured, queryable metadata that follows the data from extraction through to the research table.

The standard

Every NLP-derived observation should be fully traceable

At Emory, we are building an NLP infrastructure for OMOP that treats metadata as a first-class citizen. Every extracted fact carries a provenance chain from the research table back to the original note:

measurement_DERIVED row
  → note_nlp_modifier (what attributes were applied)
    → note_span (exact text, offsets, confidence)
      → nlp_execution (when, which pipeline)
        → pipeline + components (what config)
          → nlp_system (which system, what version)
            → note (the original clinical text)

NLP-extracted data lands in _DERIVED suffix tables — condition_DERIVED, measurement_DERIVED, drug_DERIVED — that mirror the OMOP schema but are structurally separated from discrete clinical data. Researchers choose when and how to join them.

The infrastructure is model-agnostic. MedSpaCy, BioBERT, GPT-4, a custom LSTM — it doesn't matter. What matters is that the system, the pipeline, the components, the execution, and the confidence are all recorded in a standard, queryable schema.

This is what moves NLP from "it works on my laptop" to "I can defend this in a methods section, and a downstream researcher can audit it two years from now."

Read the full architecture

Our NOTE and NLP Infrastructure whitepaper describes the 4-layer, 13-table schema that makes this possible — from pipeline registration through _DERIVED tables.

Architecture TRIPOD-LLM Alignment FUTURE-AI Alignment Entity Relationship Diagram Glossary

Emory OMOP

GitHub

Emory Enterprise OMOP
Applications
Applications
- Code
  Code
- GUI
  GUI
Blog
Blog
- Tags
- Archive
  Archive
  - March 2026
- Categories
  Categories
Contact Us
Contact Us
Data in Enterprise OMOP
Data in Enterprise OMOP
- Data Mapping
  Data Mapping
  - Custom Concepts
    
    Custom Concepts
    
    Contributing Vocabularies
    
    Network Study Bifurcation
    
    Requesting Mappings
    
    Custom Vocabulary Strategy
  - Extract Load Transform (ELT)
    
    Extract Load Transform (ELT)
    
    Era Algorithms
    
    Era Algorithms
  - Vocabulary Mapping Coverage
    
    Vocabulary Mapping Coverage
- Data Quality
  Data Quality
  - Data Quality Design
    
    Data Quality Design
    
    Subsampling (Canaries)
    
    Subsampling (Canaries)
  - Data Quality Results
    
    Data Quality Results
  - DBT Tests
    
    DBT Tests
  - Known Issues
    
    Known Issues
- NLP Infrastructure
  NLP Infrastructure
- Observed Conventions
  Observed Conventions
  - Documented Adherence
    
    Documented Adherence
  - Emory Conventions
    
    Emory Conventions
  - OHDSI Conventions
    
    OHDSI Conventions
- Releases
  Releases
  - V0.2.x
    
    V0.2.x
    
    V0.2.1
    
    V0.2.1
    
    V0.2.2
    
    V0.2.2
    
    V0.2.3
    
    V0.2.3
    
    V0.2.4
    
    V0.2.4
  - V1.x
    
    V1.x
    
    V1.0.0
    
    V1.0.0
    
    V1.1.0
    
    V1.1.0
Divisions
Divisions
- BrainHealth
  BrainHealth
- Nursing
  Nursing
- Winship
  Winship
OMOP Primers
OMOP Primers
- Standardized Categories
  Standardized Categories
  - Clinical Data
    
    Clinical Data
    
    Conditions
    Conditions
    
    Condition Occurrence
    
    Condition Occurrence
    
    Derived Condition Era
    
    Derived Condition Era
    
    Death
    
    Death
    
    Derived Elements
    
    Derived Elements
    
    Device Exposure
    
    Device Exposure
    
    Drugs
    Drugs
    
    Derived
    
    Derived
    
    Dose Era
    
    Dose Era
    
    Drug Era
    
    Drug Era
    
    Drug Exposure
    
    Drug Exposure
    
    Episodes
    Episodes
    
    Episode
    
    Episode
    
    Episode Event
    
    Episode Event
    
    Fact Relationship
    
    Fact Relationship
    
    Measurement
    
    Measurement
    
    Notes
    Notes
    
    Note
    
    Note
    
    Note NLP
    
    Note NLP
    
    Observation
    
    Observation
    
    Person
    
    Person
    
    Procedure Occurrence
    
    Procedure Occurrence
    
    Specimen
    
    Specimen
    
    Visits
    Visits
    
    Visit Detail
    
    Visit Detail
    
    Visit Occurrence
    
    Visit Occurrence
  - Health Economics
    
    Health Economics
    
    Cost
    
    Cost
    
    Payer Plan Period
    
    Payer Plan Period
  - Health System
    
    Health System
    
    Care Site
    
    Care Site
    
    Location
    
    Location
    
    Provider
    
    Provider
  - Other
    Other
    
    Metadata
    
    Metadata
    
    Results
    
    Results
  - Vocabularies
    
    Vocabularies
- When to Use OMOP
  When to Use OMOP
Project and Product Management
Project and Product Management
- DevOps Philosophy
  DevOps Philosophy
- LLM Use Disclosure
  LLM Use Disclosure
- Product Roadmap
  Product Roadmap
  - Detailed Product Roadmap
Support
Support
- Access Requests
  Access Requests
  - ATLAS
    
    ATLAS
  - Databases
    
    Databases
Training
Training
- Emory
  Emory
  - ATLAS
    
    ATLAS
  - General Session Tools and Tips
    
    General Session Tools and Tips
  - R
    
    R
  - SQL
    
    SQL
    
    Query Library
    
    Query Library
- External Educational Resources
  External Educational Resources
  - OHDSI General
    
    OHDSI General