TRIPOD-LLM Alignment

DRAFT — Internal Review Only · Not for distribution in wide release

This document is a working draft intended for internal review by the Enterprise OMOP implementation team and our growing network of collaborators. It has not been finalized, peer-reviewed, or approved for external distribution.

See releases and roadmap for details.

Context

See Notes and NLP Design for the problem overview and Architecture for the full schema specification.

Two halves of the same problem

In January 2025, TRIPOD-LLM was published in Nature Medicine as a consensus reporting guideline for studies using large language models in healthcare (Gallifant et al., 2025). It provides 19 main items and 50 subitems covering everything from title and abstract through methods, results, and discussion — a comprehensive checklist for what to write in a paper.

The Enterprise OMOP NLP infrastructure addresses the other half: what to record in the database. TRIPOD-LLM tells authors how to describe their NLP pipeline in a methods section. Our architecture makes that same metadata queryable, traceable, and attached to every row of extracted data.

Neither replaces the other. A TRIPOD-LLM–compliant paper without infrastructure metadata leaves downstream researchers unable to audit the data. Infrastructure metadata without publication-level reporting leaves the scientific community unable to evaluate the approach. Both are necessary.

Mapping TRIPOD-LLM to the NLP infrastructure

The table below maps TRIPOD-LLM checklist items to the corresponding tables and fields in the Enterprise OMOP NLP architecture. Items are grouped by where the metadata lives: in the infrastructure (queryable), in the publication (prose), or requiring both.

Model and pipeline identity

These items describe what system produced the NLP output.

TRIPOD-LLM Item	Description	Infrastructure Table	Infrastructure Fields
6a	Report the LLM name, version, and last date of training	`nlp_system`	`name`, `version`
6b	Report architecture, training, fine-tuning, alignment strategy	`nlp_system`, `Component`	`name`, `version`, `data`
6c	Report prompt engineering, inference settings (seed, temperature, max tokens)	`pipeline`, `pipeline_component`	`pipeline_name`, `config`

Infrastructure advantage

When a researcher queries measurement_DERIVED, they can join through note_span_execution → nlp_execution → nlp_system to recover the exact system name, version, and pipeline configuration that produced every row — without consulting the original paper.

Data and preprocessing

These items describe what data the NLP system processed.

TRIPOD-LLM Item	Description	Infrastructure Table	Infrastructure Fields
5a	Sources of training, tuning, and evaluation data	`nlp_system`, `Component`	`data` (model artifact references)
5b	Quantitative and qualitative description of the dataset	Publication only	—
5c	Date of oldest and newest data used	`nlp_execution`	`nlp_date`
5d	Data preprocessing and quality checking	Publication + `pipeline_component`	`config`
5e	Missing and imbalanced data handling	Publication only	—

Execution and reproducibility

These items describe when and how the NLP system ran.

TRIPOD-LLM Item	Description	Infrastructure Table	Infrastructure Fields
12	Compute, proxies, time, machines, inference time	`nlp_execution`	`nlp_date`, `worker_version`
6d	Initial and postprocessed output (probabilities, classification)	`note_span`	`probability`
6e	Classification rationale and thresholds	`note_span`, confidence tiers	`probability` thresholds
14f	Availability of code to reproduce results	Publication only	—

Infrastructure advantage

TRIPOD-LLM item 12 asks authors to report compute details in prose. The infrastructure captures nlp_execution records with execution dates and worker versions automatically — every run, not just the one described in the paper.

Output quality and evaluation

These items describe how well the NLP system performed.

TRIPOD-LLM Item	Description	Infrastructure Table	Infrastructure Fields
7a	Metrics: consistency, relevance, accuracy, errors vs gold standards	Publication + future `nlp_evaluation`	Planned extension
7b	Outcome metrics' relevance to deployment	Publication only	—
7c	How predictions were calculated (formula, code, API)	`pipeline`, `pipeline_component`	`config`, component chain
7d	Annotator qualifications, interassessor agreement	Publication only	—
7e	Comparison to other LLMs, humans, benchmarks	Publication only	—

Future direction: evaluation metadata

The current architecture does not include tables for model performance metrics (precision, recall, F1). This is identified as a future direction — linking execution records to evaluation results from validation runs would close the gap between TRIPOD-LLM items 7a–7e and queryable infrastructure.

Annotation and prompting

These items describe how human oversight was conducted.

TRIPOD-LLM Item	Description	Infrastructure Table	Infrastructure Fields
8a	Annotation guidelines and labeling methodology	Publication only	—
8b	Number of annotators, interannotator agreement	Publication only	—
8c	Annotator background and experience	Publication only	—
9a	Prompt design, curation, and selection processes	`pipeline_component`	`config` (prompt stored as component config)
9b	Data used to develop prompts	Publication only	—
10	Preprocessing of data before summarization	`pipeline_component`	`config`, component chain

Provenance and separation

These items describe how to distinguish NLP-derived data from discrete clinical data — the core contribution of the Enterprise OMOP architecture that TRIPOD-LLM does not address.

Concern	TRIPOD-LLM Coverage	Infrastructure Coverage
Separating NLP data from discrete EHR data	Not addressed	`_DERIVED` suffix tables
Tracing an extracted fact back to the source note	Not addressed	Full provenance chain: `_DERIVED` → `note_nlp_modifier` → `note_span` → `note`
Filtering by pipeline version or confidence	Not addressed	`nlp_execution` join + `probability` field
Supporting multiple NLP systems on the same notes	Not addressed	`nlp_system` + `pipeline` + `nlp_execution` hierarchy
Downstream researcher audit capability	Not addressed	All metadata is queryable SQL, not prose

The gap TRIPOD-LLM cannot fill

TRIPOD-LLM is a publication checklist — it standardizes what authors write in a methods section. But a downstream researcher who joins condition_DERIVED to their cohort two years later does not read the original paper. They need the metadata in the database, attached to the data, queryable with SQL.

This is the fundamental gap the Enterprise OMOP NLP infrastructure addresses.

Summary

Dimension	TRIPOD-LLM	Enterprise OMOP NLP Infrastructure
Audience	Paper authors, reviewers, editors	NLP engineers, researchers, downstream data consumers
Format	Prose in a manuscript	Structured, queryable database tables
Scope	One study, one publication	Every execution, every pipeline, every extracted fact
Lifecycle	Written once at publication time	Updated with every pipeline run
Provenance depth	Describes the pipeline in a methods section	Links every derived row to the exact execution, system, and source note
Data separation	Not addressed	`_DERIVED` tables structurally separate NLP data from discrete EHR data
Reproducibility	Enables reproduction from the paper	Enables audit from the data

References

Gallifant J, Afshar M, Ameen S, et al. The TRIPOD-LLM reporting guideline for studies using large language models. Nature Medicine. 2025;31:60–69. doi:10.1038/s41591-024-03425-5
Fu S, Wang L, Moon S, et al. Recommended practices and ethical considerations for natural language processing–assisted observational research: A scoping review. Clinical and Translational Science. 2023;16(3):398–411. doi:10.1111/cts.13463
TRIPOD-LLM interactive checklist: tripod-llm.vercel.app

Notes and NLP Design — The problem and why NLP metadata matters
Architecture — The 4-layer, 13-table schema specification
Entity Relationship Diagram — Visual schema reference