Skip to content

Commit 032f07e

Browse files
Add pipeline framework (#61)
* Added pipelines WIP * Added pipeline and io components * Added validation and tests * Tidied up typing, added property utils to pipeline, updated tests * Fix component name string in stages property * Changed model name to be generic * Added methods to data containers * Add simple preprocessing and postprocessing components * Update dependencies * Remove print statement * Fix preprocessor name * Remove configs from pre and postprocessors * Fix Discord link * Update documentation * Make pipeline wrapper callable method less verbose * Fail removing/replacing non-existing components louder * Update README.md * Added built-in .build() when pipeline is first called * Update docs with usage * README.md * README.md - link
1 parent cdfabb9 commit 032f07e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4682
-1127
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -160,5 +160,7 @@ cython_debug/
160160
#.idea/
161161

162162
output/
163+
scrap/
163164
.DS_Store
164165
.vscode/
166+
.ruff_cache/

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ If you're a developer, there are many ways you can contribute code:
2626

2727
## Join Our Discord
2828

29-
Are you a domain expert with valuable insights? We encourage you to join our [Discord community](https://discord.gg/4v6XgGBZ) and share your wisdom. Your expertise can help shape the future of the project and guide us in making informed decisions.
29+
Are you a domain expert with valuable insights? We encourage you to join our [Discord community](https://discord.gg/UQC6uAepUz) and share your wisdom. Your expertise can help shape the future of the project and guide us in making informed decisions.
3030

3131
We believe that every contribution, big or small, makes a difference. Thank you for being a part of our community!
3232

README.md

Lines changed: 108 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -10,138 +10,183 @@
1010

1111
</div>
1212

13-
Simplify testing and evaluating AI and NLP applications in a healthcare context 💫 🏥.
13+
Simplify developing, testing and validating AI and NLP applications in a healthcare context 💫 🏥.
1414

15-
Building applications that integrate in healthcare systems is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.
15+
Building applications that integrate with electronic health record systems (EHRs) is complex, and so is designing reliable, reactive algorithms involving unstructured data. Let's try to change that.
1616

1717
```bash
1818
pip install healthchain
1919
```
20-
First time here? Check out our [Docs](dotimplement.github.io/HealthChain/) page!
20+
First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain/) page!
2121

2222
## Features
23-
- [x] 🍱 Create sandbox servers and clients that comply with real EHRs API and data standards.
24-
- [x] 🗃️ Generate synthetic FHIR resources or load your own data as free-text.
25-
- [x] 💾 Save generated request and response data for each sandbox run.
26-
- [x] 🎈 Streamlit dashboard to inspect generated data and responses.
27-
- [x] 🧪 Experiment with LLMs in an end-to-end HL7-compliant pipeline from day 1.
23+
- [x] 🛠️ Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks
24+
- [x] 🏗️ Add built-in CDA and FHIR parsers to connect your pipeline to interoperability standards
25+
- [x] 🧪 Test your pipelines in full healthcare-context aware [sandbox](https://dotimplement.github.io/HealthChain/reference/sandbox/sandbox/) environments
26+
- [x] 🗃️ Generate [synthetic healthcare data](https://dotimplement.github.io/HealthChain/reference/utilities/data_generator/) for testing and development
27+
- [x] 🚀 Deploy sandbox servers locally with [FastAPI](https://fastapi.tiangolo.com/)
2828

2929
## Why use HealthChain?
30-
- **Scaling EHR integrations is a manual and time-consuming process** - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
31-
- **Evaluating the behaviour of AI in complex systems is a difficult and labor-intensive task** - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
32-
- **[Most healthcare data is unstructured](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372467/)** - HealthChain is optimised for real-time AI/NLP applications that deal with realistic healthcare data.
30+
- **EHR integrations are manual and time-consuming** - HealthChain abstracts away complexities so you can focus on AI development, not EHR configurations.
31+
- **It's difficult to track and evaluate multiple integration instances** - HealthChain provides a framework to test the real-world resilience of your whole system, not just your models.
32+
- [**Most healthcare data is unstructured**](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6372467/) - HealthChain is optimized for real-time AI and NLP applications that deal with realistic healthcare data.
3333
- **Built by health tech developers, for health tech developers** - HealthChain is tech stack agnostic, modular, and easily extensible.
3434

35-
## Clinical Decision Support (CDS)
35+
## Pipeline
36+
Pipelines provide a flexible way to build and manage processing pipelines for NLP and ML tasks that can easily interface with parsers and connectors to integrate with EHRs.
37+
38+
### Building a pipeline
39+
40+
```python
41+
from healthchain.io.containers import Document
42+
from healthchain.pipeline import Pipeline
43+
from healthchain.pipeline.components import TextPreProcessor, Model, TextPostProcessor
44+
45+
# Initialize the pipeline
46+
nlp_pipeline = Pipeline[Document]()
47+
48+
# Add TextPreProcessor component
49+
preprocessor = TextPreProcessor(tokenizer="spacy")
50+
nlp_pipeline.add(preprocessor)
51+
52+
# Add Model component (assuming we have a pre-trained model)
53+
model = Model(model_path="path/to/pretrained/model")
54+
nlp_pipeline.add(model)
55+
56+
# Add TextPostProcessor component
57+
postprocessor = TextPostProcessor(
58+
postcoordination_lookup={
59+
"heart attack": "myocardial infarction",
60+
"high blood pressure": "hypertension"
61+
}
62+
)
63+
nlp_pipeline.add(postprocessor)
64+
65+
# Build the pipeline
66+
nlp = nlp_pipeline.build()
67+
68+
# Use the pipeline
69+
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))
70+
71+
print(f"Entities: {result.entities}")
72+
```
73+
### Using pre-built pipelines
74+
75+
```python
76+
from healthchain.io.containers import Document
77+
from healthchain.pipeline import MedicalCodingPipeline
78+
79+
# Load the pre-built MedicalCodingPipeline
80+
pipeline = MedicalCodingPipeline.load("./path/to/model")
81+
82+
# Create a document to process
83+
result = pipeline(Document("Patient has a history of myocardial infarction and hypertension."))
84+
85+
print(f"Entities: {result.entities}")
86+
```
87+
88+
## Sandbox
89+
90+
Sandboxes provide a staging environment for testing and validating your pipeline in a realistic healthcare context.
91+
92+
### Clinical Decision Support (CDS)
3693
[CDS Hooks](https://cds-hooks.org/) is an [HL7](https://cds-hooks.hl7.org) published specification for clinical decision support.
3794

3895
**When is this used?** CDS hooks are triggered at certain events during a clinician's workflow in an electronic health record (EHR), e.g. when a patient record is opened, when an order is elected.
3996

40-
**What information is sent**: the context of the event and FHIR resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.
97+
**What information is sent**: the context of the event and [FHIR](https://hl7.org/fhir/) resources that are requested by your service, for example, the patient ID and information on the encounter and conditions they are being seen for.
4198

4299
**What information is returned**: “cards” displaying text, actionable suggestions, or links to launch a [SMART](https://smarthealthit.org/) app from within the workflow.
43100

44-
**What you need to decide**: What data do I want my EHR client to send, and how will my service process this data.
45-
46101

47102
```python
48103
import healthchain as hc
49104

105+
from healthchain.pipeline import Pipeline
50106
from healthchain.use_cases import ClinicalDecisionSupport
51107
from healthchain.models import Card, CdsFhirData, CDSRequest
52-
from healthchain.data_generator import DataGenerator
53-
108+
from healthchain.data_generator import CdsDataGenerator
54109
from typing import List
55110

56-
# Decorate class with sandbox and pass in use case
57111
@hc.sandbox
58-
class myCDS(ClinicalDecisionSupport):
112+
class MyCDS(ClinicalDecisionSupport):
59113
def __init__(self) -> None:
60-
self.data_generator = DataGenerator()
114+
self.pipeline = Pipeline.load("./path/to/model")
115+
self.data_generator = CdsDataGenerator()
61116

62117
# Sets up an instance of a mock EHR client of the specified workflow
63118
@hc.ehr(workflow="patient-view")
64119
def ehr_database_client(self) -> CdsFhirData:
65-
self.data_generator.generate()
66-
return self.data_generator.data
120+
return self.data_generator.generate()
67121

68122
# Define your application logic here
69123
@hc.api
70-
def my_service(self, request: CdsRequest) -> List[Card]:
71-
result = "Hello " + request["patient_name"]
72-
return result
73-
74-
if __name__ == "__main__":
75-
cds = myCDS()
76-
cds.start_sandbox()
77-
```
78-
79-
Then run:
80-
```bash
81-
healthchain run mycds.py
124+
def my_service(self, data: CDSRequest) -> List[Card]:
125+
result = self.pipeline(data)
126+
return [
127+
Card(
128+
summary="Welcome to our Clinical Decision Support service.",
129+
detail=result.summary,
130+
indicator="info"
131+
)
132+
]
82133
```
83-
This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in `./output` by default.
84134

85-
## Clinical Documentation
135+
### Clinical Documentation
86136

87-
The ClinicalDocumentation use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.
137+
The `ClinicalDocumentation` use case implements a real-time Clinical Documentation Improvement (CDI) service. It helps convert free-text medical documentation into coded information that can be used for billing, quality reporting, and clinical decision support.
88138

89139
**When is this used?** Triggered when a clinician opts in to a CDI functionality (e.g. Epic NoteReader) and signs or pends a note after writing it.
90140

91-
**What information is sent**: A [CDA (Clinical Document Architecture)](https://www.hl7.org/implement/standards/product_brief.cfm?product_id=7) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.
92-
93-
**What information is returned**: A CDA document which contains additional structured data extracted and returned by your CDI service.
141+
**What information is sent**: A [CDA (Clinical Document Architecture)](https://www.hl7.org.uk/standards/hl7-standards/cda-clinical-document-architecture/) document which contains continuity of care data and free-text data, e.g. a patient's problem list and the progress note that the clinician has entered in the EHR.
94142

95143
```python
96144
import healthchain as hc
97145

146+
from healthchain.pipeline import MedicalCodingPipeline
98147
from healthchain.use_cases import ClinicalDocumentation
99148
from healthchain.models import CcdData, ProblemConcept, Quantity,
100149

101150
@hc.sandbox
102151
class NotereaderSandbox(ClinicalDocumentation):
103152
def __init__(self):
104-
self.cda_path = "./resources/uclh_cda.xml"
153+
self.pipeline = MedicalCodingPipeline.load("./path/to/model")
105154

106155
# Load an existing CDA file
107156
@hc.ehr(workflow="sign-note-inpatient")
108157
def load_data_in_client(self) -> CcdData:
109-
with open(self.cda_path, "r") as file:
158+
with open("/path/to/cda/data.xml", "r") as file:
110159
xml_string = file.read()
111160

112161
return CcdData(cda_xml=xml_string)
113162

114-
# Define application logic
115163
@hc.api
116164
def my_service(self, ccd_data: CcdData) -> CcdData:
117-
# Apply method from ccd_data.note and access existing entries from ccd.problems
118-
119-
new_problem = ProblemConcept(
120-
code="38341003",
121-
code_system="2.16.840.1.113883.6.96",
122-
code_system_name="SNOMED CT",
123-
display_name="Hypertension",
124-
)
125-
ccd_data.problems.append(new_problem)
126-
return ccd_data
165+
annotated_ccd = self.pipeline(ccd_data)
166+
return annotated_ccd
127167
```
168+
### Running a sandbox
128169

170+
Ensure you run the following commands in your `mycds.py` file:
129171

130-
### Streamlit dashboard
131-
Note this is currently not meant to be a frontend to the EHR client, so you will have to run it separately from the sandbox application.
172+
```python
173+
cds = MyCDS()
174+
cds.run_sandbox()
175+
```
176+
This will populate your EHR client with the data generation method you have defined, send requests to your server for processing, and save the data in the `./output` directory.
132177

178+
Then run:
133179
```bash
134-
pip install streamlit
135-
streamlit streamlit-demo/app.py
180+
healthchain run mycds.py
136181
```
137-
182+
By default, the server runs at `http://127.0.0.1:8000`, and you can interact with the exposed endpoints at `/docs`.
138183
## Road Map
139-
- [x] 📝 Adding Clinical Documentation use case
140-
- [ ] 🎛️ Version and test different EHR backend configurations
141-
- [ ] 🤖 Integrations with popular LLM and NLP libraries
142-
- [ ] ❓ Evaluation framework for pipelines and use cases
184+
- [ ] 🎛️ Versioning and artifact management for pipelines sandbox EHR configurations
185+
- [ ] 🤖 Integrations with other pipeline libraries such as spaCy, HuggingFace, LangChain etc.
186+
- [ ] ❓ Testing and evaluation framework for pipelines and use cases
187+
- [ ] 🧠 Multi-modal pipelines that that have built-in NLP to utilize unstructured data
143188
- [ ] ✨ Improvements to synthetic data generator methods
144-
- [ ] 👾 Frontend demo for EHR client
189+
- [ ] 👾 Frontend UI for EHR client and visualization features
145190
- [ ] 🚀 Production deployment options
146191

147192
## Contribute

docs/api/component.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
# Component
2+
3+
::: healthchain.pipeline.components.basecomponent
4+
::: healthchain.pipeline.components.preprocessors
5+
::: healthchain.pipeline.components.models
6+
::: healthchain.pipeline.components.postprocessors

docs/api/containers.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Containers
2+
3+
::: healthchain.io.containers

docs/api/pipeline.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Pipeline
2+
3+
::: healthchain.pipeline.basepipeline

docs/community/contribution_guide.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Contribution Guide

docs/community/resources.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# Resources

docs/cookbook/cds_sandbox.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Build a CDS sandbox
2+
3+
A CDS sandbox which uses `gpt-4o` to summarise patient information from synthetically generated FHIR resources received from the `patient-view` CDS hook.
4+
5+
```python
6+
import healthchain as hc
7+
8+
from healthchain.use_cases import ClinicalDecisionSupport
9+
from healthchain.data_generators import CdsDataGenerator
10+
from healthchain.models import Card, CdsFhirData, CDSRequest
11+
12+
from langchain_openai import ChatOpenAI
13+
from langchain_core.prompts import PromptTemplate
14+
from langchain_core.output_parsers import StrOutputParser
15+
16+
from typing import List
17+
18+
@hc.sandbox
19+
class CdsSandbox(ClinicalDecisionSupport):
20+
def __init__(self):
21+
self.chain = self._init_llm_chain()
22+
self.data_generator = CdsDataGenerator()
23+
24+
def _init_llm_chain(self):
25+
prompt = PromptTemplate.from_template(
26+
"Extract conditions from the FHIR resource below and summarize in one sentence using simple language \n'''{text}'''"
27+
)
28+
model = ChatOpenAI(model="gpt-4o")
29+
parser = StrOutputParser()
30+
31+
chain = prompt | model | parser
32+
return chain
33+
34+
@hc.ehr(workflow="patient-view")
35+
def load_data_in_client(self) -> CdsFhirData:
36+
data = self.data_generator.generate()
37+
return data
38+
39+
@hc.api
40+
def my_service(self, request: CDSRequest) -> List[Card]:
41+
result = self.chain.invoke(str(request.prefetch))
42+
return Card(
43+
summary="Patient summary",
44+
indicator="info",
45+
source={"label": "openai"},
46+
detail=result,
47+
)
48+
```

docs/cookbook/index.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,6 @@
1-
# Cookbook
1+
# Examples
2+
3+
The best way to learn is by example! Here are some to get you started:
4+
5+
- [Build a CDS sandbox](./cds_sandbox.md): Build a clinical decision support (CDS) system that uses *patient-view* to greet the patient.
6+
- [Build a Clinical Documentation sandbox](./notereader_sandbox.md): Build a NoteReader system which extracts problem, medication, and allergy concepts from free-text clinical notes.

0 commit comments

Comments
 (0)