This document summarizes the extraction schema defined in information_extraction/schema.py and provides an idealized example StudyRecord JSON.
ExtractedValue[T]is the wrapper for extracted leaf fields. It includes:value: the extracted value.evidence: list of 1+ evidence spans whenvalueis present.unit: unit for numeric values (e.g., "ms", "s", "years", "Tesla").missing_reason: optional reason for missing values (e.g., "not_reported").note,scope,confidence: optional metadata.
- Container fields (e.g.,
StudyRecord,StudyLinks, entity lists) are plain objects/lists, notExtractedValue. - List fields that require per-item evidence use
List[ExtractedValue[T]]. - Evidence is optional in the schema; pipelines may enforce non-null evidence when values exist.
- Extraction metadata is stored in
json_schema_extraviaextraction_meta:extraction_type,extraction_prompt,scope_hint,extraction_phase,allow_note,inference_policy.
EvidenceSpanincludessource,section,extraction_text,char_interval,alignment_status,extraction_index,group_index,document_id, and optionallocator.- Entities are listed separately and linked with explicit edges to form a graph.
StudyRecord: top-level container.StudyMetadataModel: study-wide objective, criteria, study type.GroupBase: participant groups with demographics and group-specific criteria.TaskBase: tasks, conditions, and cognitive descriptors.Condition: named conditions for task contrasts.ModalityBase: imaging modalities and acquisition parameters.AnalysisBase: reported analyses/contrasts.StudyLinks: explicit edges between entities.
group_task: which groups performed which tasks.task_modality: which modalities were used for tasks.group_modality: which groups underwent which modalities (supports subsets).analysis_task: which tasks an analysis tests.analysis_group: which groups are included in an analysis.analysis_condition: which conditions are referenced in an analysis.- Edges may include optional per-edge evidence spans.
task_nameis a short label; no narrative details.task_descriptionis narrative; avoid timing or total duration.design_detailsis timing/run/block details; avoid narrative.task_durationis only total duration.- Tasks can be interventions (e.g., stimulation/drug); variants like left/right/both or on/off belong in
conditions. task_categorytags the task type (e.g., CognitiveTask, Intervention, Exposure, RestingState) only when explicitly stated.conceptsare raw phrases;domain_tagsare normalized tags.cohort_labelcan be descriptive;medical_conditionis diagnosis only.population_roleis role only (patient/control/etc.).age_rangeis reported sample range;age_minimum/age_maximumare explicit minimum/maximum if stated.- Demographics are captured at the group level only.
{
"study": {
"study_objective": {
"value": "Test whether conflict monitoring deficits in schizophrenia are associated with altered task-evoked fMRI responses.",
"evidence": [
{
"section": "Abstract",
"extraction_text": "We investigated conflict monitoring in schizophrenia using a Stroop task during fMRI.",
"char_interval": {"start_pos": 120, "end_pos": 210},
"alignment_status": "match_exact",
"extraction_index": 1,
"group_index": 0,
"document_id": "doc_abc123"
}
]
},
"study_type": {
"value": "OriginalResearch",
"evidence": [
{
"section": "Title",
"extraction_text": "An fMRI study of Stroop performance in schizophrenia",
"char_interval": {"start_pos": 0, "end_pos": 58},
"alignment_status": "match_exact",
"extraction_index": 2,
"group_index": 0,
"document_id": "doc_abc123"
}
]
},
"inclusion_criteria": [
{
"value": "Diagnosis of schizophrenia",
"evidence": [
{
"section": "Methods/Participants",
"extraction_text": "Patients met DSM-IV criteria for schizophrenia.",
"char_interval": {"start_pos": 980, "end_pos": 1035},
"alignment_status": "match_exact",
"extraction_index": 3,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
]
},
"demographics": {
"groups": [
{
"id": "G1",
"cohort_label": {
"value": "Schizophrenia patients",
"evidence": [
{
"section": "Methods/Participants",
"extraction_text": "Patients with schizophrenia (n=22)",
"char_interval": {"start_pos": 920, "end_pos": 955},
"alignment_status": "match_exact",
"extraction_index": 4,
"group_index": 0,
"document_id": "doc_abc123"
}
]
},
"count": {
"value": 22,
"evidence": [
{
"section": "Methods/Participants",
"extraction_text": "Patients with schizophrenia (n=22)",
"char_interval": {"start_pos": 920, "end_pos": 955},
"alignment_status": "match_exact",
"extraction_index": 5,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
}
]
},
"tasks": [
{
"id": "T1",
"task_name": {
"value": "Stroop task",
"evidence": [
{
"section": "Methods/Task",
"extraction_text": "Participants performed a color-word Stroop task.",
"char_interval": {"start_pos": 1500, "end_pos": 1555},
"alignment_status": "match_exact",
"extraction_index": 6,
"group_index": 0,
"document_id": "doc_abc123"
}
]
},
"task_design": [
{
"value": "EventRelated",
"evidence": [
{
"section": "Methods/Task",
"extraction_text": "Event-related design with 120 trials.",
"char_interval": {"start_pos": 1622, "end_pos": 1665},
"alignment_status": "match_exact",
"extraction_index": 7,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
],
"conditions": [
{
"id": "C1",
"condition_label": {
"value": "Incongruent",
"evidence": [
{
"section": "Methods/Task",
"extraction_text": "Incongruent trials were defined as...",
"char_interval": {"start_pos": 1755, "end_pos": 1795},
"alignment_status": "match_exact",
"extraction_index": 8,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
}
]
}
],
"modalities": [
{
"id": "M1",
"modality_type": {
"family": {
"value": "fMRI",
"evidence": [
{
"section": "Methods/Imaging",
"extraction_text": "Functional MRI data were acquired...",
"char_interval": {"start_pos": 2000, "end_pos": 2045},
"alignment_status": "match_exact",
"extraction_index": 9,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
},
"manufacturer": {
"value": "Siemens",
"evidence": [
{
"section": "Methods/Imaging",
"extraction_text": "Images were collected on a Siemens 3T scanner.",
"char_interval": {"start_pos": 2090, "end_pos": 2140},
"alignment_status": "match_exact",
"extraction_index": 10,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
}
],
"analyses": [
{
"id": "A1",
"contrast_formula": {
"value": "Incongruent > Congruent",
"evidence": [
{
"section": "Methods/Analysis",
"extraction_text": "The primary contrast was Incongruent > Congruent.",
"char_interval": {"start_pos": 2725, "end_pos": 2775},
"alignment_status": "match_exact",
"extraction_index": 11,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
}
],
"links": {
"group_task": [
{
"group_id": "G1",
"task_id": "T1",
"evidence": [
{
"section": "Methods/Task",
"extraction_text": "All participants completed the Stroop task.",
"char_interval": {"start_pos": 1480, "end_pos": 1515},
"alignment_status": "match_exact",
"extraction_index": 12,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
]
},
"extraction_notes": [
{
"value": "Demographics reported in Table 1.",
"evidence": [
{
"section": "Table 1",
"extraction_text": "Participant demographics are summarized in Table 1.",
"char_interval": {"start_pos": 4100, "end_pos": 4150},
"alignment_status": "match_exact",
"extraction_index": 13,
"group_index": 0,
"document_id": "doc_abc123"
}
]
}
]
}