Skip to content

Commit fdd84af

Browse files
Add CDA Parser Documentation (#93)
* Added docs for cda parser * Update READEME.md * README
1 parent ed7c69f commit fdd84af

File tree

2 files changed

+113
-22
lines changed

2 files changed

+113
-22
lines changed

README.md

Lines changed: 34 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain
2121

2222
## Features
2323
- [x] 🛠️ Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks
24-
- [x] 🏗️ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
24+
- [x] 🏗️ Add built-in [CDA and FHIR parsers](https://dotimplement.github.io/HealthChain/reference/utilities/cda_parser/) to connect your pipeline to interoperability standards
2525
- [x] 🧪 Test your pipelines in full healthcare-context aware [sandbox](https://dotimplement.github.io/HealthChain/reference/sandbox/sandbox/) environments
2626
- [x] 🗃️ Generate [synthetic healthcare data](https://dotimplement.github.io/HealthChain/reference/utilities/data_generator/) for testing and development
2727
- [x] 🚀 Deploy sandbox servers locally with [FastAPI](https://fastapi.tiangolo.com/)
@@ -33,7 +33,7 @@ First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain
3333
- **Built by health tech developers, for health tech developers** - HealthChain is tech stack agnostic, modular, and easily extensible.
3434

3535
## Pipeline
36-
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily interface with parsers and connectors to integrate with EHRs.
36+
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily integrate with complex healthcare systems.
3737

3838
### Building a pipeline
3939

@@ -70,21 +70,39 @@ result = nlp(Document("Patient has a history of heart attack and high blood pres
7070

7171
print(f"Entities: {result.entities}")
7272
```
73+
74+
#### Adding connectors
75+
Connectors give your pipelines the ability to interface with EHRs.
76+
77+
```python
78+
from healthchain.io import CdaConnector
79+
from healthchain.models import CdaRequest
80+
81+
cda_connector = CdaConnector()
82+
83+
pipeline.add_input(cda_connector)
84+
pipeline.add_output(cda_connector)
85+
86+
pipe = pipeline.build()
87+
88+
cda_data = CdaRequest(document="<CDA XML content>")
89+
output = pipe(cda_data)
90+
```
91+
7392
### Using pre-built pipelines
93+
Pre-built pipelines are use case specific end-to-end workflows that already have connectors and models built-in.
7494

7595
```python
76-
from healthchain.io.containers import Document
7796
from healthchain.pipeline import MedicalCodingPipeline
97+
from healthchain.models import CdaRequest
7898

79-
# Load the pre-built MedicalCodingPipeline
8099
pipeline = MedicalCodingPipeline.load("./path/to/model")
81100

82-
# Create a document to process
83-
result = pipeline(Document("Patient has a history of myocardial infarction and hypertension."))
84-
85-
print(f"Entities: {result.entities}")
101+
cda_data = CdaRequest(document="<CDA XML content>")
102+
output = pipeline(cda_data)
86103
```
87104

105+
88106
## Sandbox
89107

90108
Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.
@@ -102,7 +120,7 @@ Sandboxes provide a staging environment for testing and validating your pipeline
102120
```python
103121
import healthchain as hc
104122

105-
from healthchain.pipeline import Pipeline
123+
from healthchain.pipeline import SummarizationPipeline
106124
from healthchain.use_cases import ClinicalDecisionSupport
107125
from healthchain.models import Card, CdsFhirData, CDSRequest
108126
from healthchain.data_generator import CdsDataGenerator
@@ -111,25 +129,19 @@ from typing import List
111129
@hc.sandbox
112130
class MyCDS(ClinicalDecisionSupport):
113131
def __init__(self) -> None:
114-
self.pipeline = Pipeline.load("./path/to/model")
132+
self.pipeline = SummarizationPipeline.load("./path/to/model")
115133
self.data_generator = CdsDataGenerator()
116134

117135
# Sets up an instance of a mock EHR client of the specified workflow
118-
@hc.ehr(workflow="patient-view")
136+
@hc.ehr(workflow="encounter-discharge")
119137
def ehr_database_client(self) -> CdsFhirData:
120138
return self.data_generator.generate()
121139

122140
# Define your application logic here
123141
@hc.api
124-
def my_service(self, data: CDSRequest) -> List[Card]:
142+
def my_service(self, data: CDSRequest) -> CDSRequest:
125143
result = self.pipeline(data)
126-
return [
127-
Card(
128-
summary="Welcome to our Clinical Decision Support service.",
129-
detail=result.summary,
130-
indicator="info"
131-
)
132-
]
144+
return result
133145
```
134146

135147
### Clinical Documentation
@@ -145,7 +157,7 @@ import healthchain as hc
145157

146158
from healthchain.pipeline import MedicalCodingPipeline
147159
from healthchain.use_cases import ClinicalDocumentation
148-
from healthchain.models import CcdData, ProblemConcept, Quantity,
160+
from healthchain.models import CcdData, CdaRequest, CdaResponse
149161

150162
@hc.sandbox
151163
class NotereaderSandbox(ClinicalDocumentation):
@@ -161,8 +173,8 @@ class NotereaderSandbox(ClinicalDocumentation):
161173
return CcdData(cda_xml=xml_string)
162174

163175
@hc.api
164-
def my_service(self, ccd_data: CcdData) -> CcdData:
165-
annotated_ccd = self.pipeline(ccd_data)
176+
def my_service(self, data: CdaRequest) -> CdaResponse:
177+
annotated_ccd = self.pipeline(data)
166178
return annotated_ccd
167179
```
168180
### Running a sandbox
Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,80 @@
11
# CDA Parser
2+
3+
The `CdaAnnotator` class is responsible for parsing and annotating CDA (Clinical Document Architecture) documents. It extracts information about problems, medications, allergies, and notes from the CDA document, and allows you to add new information to the CDA document.
4+
5+
The CDA parser is used in the [CDA Connector](../pipeline/connectors/cdaconnector.md) module, but can also be used independently.
6+
7+
Internally, `CdaAnnotator` parses CDA documents from XML strings to a dictionary-based representation using `xmltodict` and uses Pydantic for data validation. New problems are added to the CDA document using a template-based approach. It's currently not super configurable, but we're working on it.
8+
9+
Data interacts with the `CdaAnnotator` through `Concept` data models, which are designed to be an system-agnostic intermediary between FHIR and CDA data representations.
10+
11+
[(CdaAnnotator API Reference](../../api/cda_parser.md) [| Concept API Reference)](../../api/data_models.md#healthchain.models.data.concept)
12+
13+
## Usage
14+
15+
### Parsing CDA documents
16+
17+
Parse a CDA document from an XML string:
18+
19+
```python
20+
from healthchain.cda_parser import CdaAnnotator
21+
22+
cda = CdaAnnotator.from_xml(cda_xml_string)
23+
24+
problems = cda.problem_list
25+
medications = cda.medication_list
26+
allergies = cda.allergy_list
27+
note = cda.note
28+
29+
print([problem.name for problem in problems])
30+
print([medication.name for medication in medications])
31+
print([allergy.name for allergy in allergies])
32+
print(note)
33+
```
34+
35+
You can access data parsed from the CDA document in the `problem_list`, `medication_list`, `allergy_list`, and `note` attributes of the `CdaAnnotator` instance. They return a list of `Concept` data models.
36+
37+
### Adding new information to the CDA document
38+
39+
The methods currently available for adding new information to the CDA document are:
40+
41+
| Method | Description |
42+
|--------|-------------|
43+
| `.add_to_problem_list()` | Adds a list of [ProblemConcept](../../api/data_models.md#healthchain.models.data.concept.ProblemConcept) |
44+
| `.add_to_medication_list()` | Adds a list of [MedicationConcept](../../api/data_models.md#healthchain.models.data.concept.MedicationConcept) |
45+
| `.add_to_allergy_list()` | Adds a list of [AllergyConcept](../../api/data_models.md#healthchain.models.data.concept.AllergyConcept) |
46+
47+
The `overwrite` parameter in the `add_to_*_list()` methods is used to determine whether to overwrite the existing list or append to it. If `overwrite` is `True`, the existing list will be replaced with the new list. If `overwrite` is `False`, the new list will be appended to the existing list.
48+
49+
Depending on the use case, you don't always need to return the original list of information in the CDA document you receive, although this is mostly useful if you are just developing and don't want the eye-strain of a lengthy CDA document.
50+
51+
### Exporting the CDA document
52+
53+
```python
54+
xml_string = cda.export(pretty_print=True)
55+
```
56+
57+
The `pretty_print` parameter is optional and defaults to `True`. If `pretty_print` is `True`, the XML string will be formatted with newlines and indentation.
58+
59+
## Example
60+
61+
```python
62+
from healthchain.cda_parser import CdaAnnotator
63+
from healthchain.models import ProblemConcept, MedicationConcept, AllergyConcept
64+
65+
cda = CdaAnnotator.from_xml(cda_xml_string)
66+
67+
new_problems = [ProblemConcept(name="New Problem", code="123456")]
68+
new_medications = [MedicationConcept(name="New Medication", code="789012")]
69+
new_allergies = [AllergyConcept(name="New Allergy", code="345678")]
70+
71+
# Add new problems, medications, and allergies
72+
cda.add_to_problem_list(new_problems, overwrite=True)
73+
cda.add_to_medication_list(new_medications, overwrite=True)
74+
cda.add_to_allergy_list(new_allergies, overwrite=True)
75+
76+
# Export the modified CDA document
77+
modified_cda_xml = cda.export()
78+
```
79+
80+
The CDA parser is a work in progress. I'm just gonna be real with you, CDAs are the bane of my existence. If you, for some reason, love working with XML-based documents, please get [in touch](https://discord.gg/UQC6uAepUz)! We have plans to implement more functionality in the future, including allowing configurable templates, more CDA section methods, and using LLMs as a fallback parsing method.

0 commit comments

Comments
 (0)