Skip to content

Commit 6e80363

Browse files
MedicalCodingPipeline and SummarizationPipeline implementations (#95)
* Added connector modules * Fix typo * Added processing of io connectors in pipelines * Refactored CDA related processing in use case to connectors * Added tests * Added CdsFhirConnector * first pass at adding spacy and hf integrations * Updated use case functions and tests * WIP connector usage in pipelines and components * Fix model import name in docs * Update Bundle validator method to dynamically import nested resource types * Update CdsFhirConnector input method validations * Add create method to CdsFhirData * Fixed CdsResponse should return list of actions * Added tests * Added pipeline tests * fix pyproject * adding langchain and modifying document * added testing * Changed .add() -> .add_node() to make more explicit and use convention of BaseObject and base.py in modules * Update documentation to reflect changes in this PR * adding docs * finish docs * fix test * fix test2 * WIP * skip transformers test * fix tests * adding magicmock for iterable * respond to feedback * Refactor and update document container and ccddata design * Fix tests * Add docstrings * Replace Model with ModelRouter * Fix docs ci * Add method to add concepts in spacy component * Refactor Document container * Update pipeline load method to dynamically read from string paths * Fix tests * Change load method to use source parameter * Renamed integration components * Remove spacy from preprocessor component and allow callable instead * Pass kwargs to integration components * Added CdsCardCreator implementation * Updated tests for prebuilt pipelines * Added tests for pipeline loading method and modelrouter * Update test for spacy integration * Tweak fixture * Use Mixin for ModelRouter * Clean up __init__ imports * Fix resourceType not showing up by explicitly passing it in when called in data generator * Parse text from DocumentReference in cdsfhir * Add delimiter to create multiple cards and basic text cleaner for templates to card reader * Make model loading more explicit and added langchain routing * Update prebuilt pipeline initialization methods * Update tests * Added cookbook * Moved default mapping initialization inside data generator * Split .load method to from_model_id and from_local_model and added template path as init option * Update tests and docs * Tidy up docstrings and .load usage * Update tests * Update documentation * Add cookbook examples * Update dependencies --------- Co-authored-by: Adam Kells <[email protected]>
1 parent 0bd1fd2 commit 6e80363

File tree

79 files changed

+4009
-1614
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+4009
-1614
lines changed

README.md

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ pip install healthchain
1919
```
2020
First time here? Check out our [Docs](https://dotimplement.github.io/HealthChain/) page!
2121

22+
Came here from NHS RPySOC 2024 ✨? [CDS sandbox walkthrough](https://dotimplement.github.io/HealthChain/cookbook/cds_sandbox/)
23+
2224
## Features
2325
- [x] 🛠️ Build custom pipelines or use [pre-built ones](https://dotimplement.github.io/HealthChain/reference/pipeline/pipeline/#prebuilt) for your healthcare NLP and ML tasks
2426
- [x] 🏗️ Add built-in [CDA and FHIR parsers](https://dotimplement.github.io/HealthChain/reference/utilities/cda_parser/) to connect your pipeline to interoperability standards
@@ -40,7 +42,7 @@ Pipelines provide a flexible way to build and manage processing pipelines for NL
4042
```python
4143
from healthchain.io.containers import Document
4244
from healthchain.pipeline import Pipeline
43-
from healthchain.pipeline.components import TextPreProcessor, Model, TextPostProcessor
45+
from healthchain.pipeline.components import TextPreProcessor, SpacyNLP, TextPostProcessor
4446

4547
# Initialize the pipeline
4648
nlp_pipeline = Pipeline[Document]()
@@ -50,8 +52,8 @@ preprocessor = TextPreProcessor(tokenizer="spacy")
5052
nlp_pipeline.add_node(preprocessor)
5153

5254
# Add Model component (assuming we have a pre-trained model)
53-
model = Model(model_path="path/to/pretrained/model")
54-
nlp_pipeline.add_node(model)
55+
spacy_nlp = SpacyNLP.from_model_id("en_core_sci_md", source="spacy")
56+
nlp_pipeline.add_node(spacy_nlp)
5557

5658
# Add TextPostProcessor component
5759
postprocessor = TextPostProcessor(
@@ -68,7 +70,7 @@ nlp = nlp_pipeline.build()
6870
# Use the pipeline
6971
result = nlp(Document("Patient has a history of heart attack and high blood pressure."))
7072

71-
print(f"Entities: {result.entities}")
73+
print(f"Entities: {result.nlp.spacy_doc.ents}")
7274
```
7375

7476
#### Adding connectors
@@ -96,7 +98,13 @@ Pre-built pipelines are use case specific end-to-end workflows that already have
9698
from healthchain.pipeline import MedicalCodingPipeline
9799
from healthchain.models import CdaRequest
98100

99-
pipeline = MedicalCodingPipeline.load("./path/to/model")
101+
# Load from model ID
102+
pipeline = MedicalCodingPipeline.from_model_id(
103+
model="blaze999/Medical-NER", task="token-classification", source="huggingface"
104+
)
105+
106+
# Or load from local model
107+
pipeline = MedicalCodingPipeline.from_local_model("./path/to/model", source="spacy")
100108

101109
cda_data = CdaRequest(document="<CDA XML content>")
102110
output = pipeline(cda_data)
@@ -129,7 +137,9 @@ from typing import List
129137
@hc.sandbox
130138
class MyCDS(ClinicalDecisionSupport):
131139
def __init__(self) -> None:
132-
self.pipeline = SummarizationPipeline.load("./path/to/model")
140+
self.pipeline = SummarizationPipeline.from_model_id(
141+
"facebook/bart-large-cnn", source="huggingface"
142+
)
133143
self.data_generator = CdsDataGenerator()
134144

135145
# Sets up an instance of a mock EHR client of the specified workflow
@@ -162,7 +172,9 @@ from healthchain.models import CcdData, CdaRequest, CdaResponse
162172
@hc.sandbox
163173
class NotereaderSandbox(ClinicalDocumentation):
164174
def __init__(self):
165-
self.pipeline = MedicalCodingPipeline.load("./path/to/model")
175+
self.pipeline = MedicalCodingPipeline.from_model_id(
176+
"en_core_sci_md", source="spacy"
177+
)
166178

167179
# Load an existing CDA file
168180
@hc.ehr(workflow="sign-note-inpatient")
@@ -192,9 +204,9 @@ Then run:
192204
healthchain run mycds.py
193205
```
194206
By default, the server runs at `http://127.0.0.1:8000`, and you can interact with the exposed endpoints at `/docs`.
207+
195208
## Road Map
196209
- [ ] 🎛️ Versioning and artifact management for pipelines sandbox EHR configurations
197-
- [ ] 🤖 Integrations with other pipeline libraries such as spaCy, HuggingFace, LangChain etc.
198210
- [ ] ❓ Testing and evaluation framework for pipelines and use cases
199211
- [ ] 🧠 Multi-modal pipelines that that have built-in NLP to utilize unstructured data
200212
- [ ] ✨ Improvements to synthetic data generator methods
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
import healthchain as hc
2+
3+
from healthchain.pipeline import SummarizationPipeline
4+
from healthchain.use_cases import ClinicalDecisionSupport
5+
from healthchain.models import CdsFhirData, CDSRequest, CDSResponse
6+
from healthchain.data_generators import CdsDataGenerator
7+
8+
from langchain_huggingface.llms import HuggingFaceEndpoint
9+
from langchain_huggingface import ChatHuggingFace
10+
11+
from langchain_core.prompts import PromptTemplate
12+
from langchain_core.output_parsers import StrOutputParser
13+
14+
import getpass
15+
import os
16+
17+
18+
if not os.getenv("HUGGINGFACEHUB_API_TOKEN"):
19+
os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your token: ")
20+
21+
22+
def create_summarization_chain():
23+
hf = HuggingFaceEndpoint(
24+
repo_id="HuggingFaceH4/zephyr-7b-beta",
25+
task="text-generation",
26+
max_new_tokens=512,
27+
do_sample=False,
28+
repetition_penalty=1.03,
29+
)
30+
model = ChatHuggingFace(llm=hf)
31+
template = """
32+
You are a bed planner for a hospital. Provide a concise, objective summary of the input text in short bullet points separated by new lines,
33+
focusing on key actions such as appointments and medication dispense instructions, without using second or third person pronouns.\n'''{text}'''
34+
"""
35+
prompt = PromptTemplate.from_template(template)
36+
return prompt | model | StrOutputParser()
37+
38+
39+
@hc.sandbox
40+
class DischargeNoteSummarizer(ClinicalDecisionSupport):
41+
def __init__(self):
42+
# Initialize pipeline and data generator
43+
chain = create_summarization_chain()
44+
self.pipeline = SummarizationPipeline.load(
45+
chain, source="langchain", template_path="templates/cds_card_template.json"
46+
)
47+
self.data_generator = CdsDataGenerator()
48+
49+
@hc.ehr(workflow="encounter-discharge")
50+
def load_data_in_client(self) -> CdsFhirData:
51+
# Generate synthetic FHIR data for testing
52+
data = self.data_generator.generate(
53+
free_text_path="data/discharge_notes.csv", column_name="text"
54+
)
55+
return data
56+
57+
@hc.api
58+
def my_service(self, request: CDSRequest) -> CDSResponse:
59+
# Process the request through our pipeline
60+
result = self.pipeline(request)
61+
return result
62+
63+
64+
if __name__ == "__main__":
65+
# Start the sandbox server
66+
summarizer = DischargeNoteSummarizer()
67+
summarizer.start_sandbox()
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import healthchain as hc
2+
3+
from healthchain.pipeline import SummarizationPipeline
4+
from healthchain.use_cases import ClinicalDecisionSupport
5+
from healthchain.models import CdsFhirData, CDSRequest, CDSResponse
6+
from healthchain.data_generators import CdsDataGenerator
7+
8+
import getpass
9+
import os
10+
11+
12+
if not os.getenv("HUGGINGFACEHUB_API_TOKEN"):
13+
os.environ["HUGGINGFACEHUB_API_TOKEN"] = getpass.getpass("Enter your token: ")
14+
15+
16+
@hc.sandbox
17+
class DischargeNoteSummarizer(ClinicalDecisionSupport):
18+
def __init__(self):
19+
self.pipeline = SummarizationPipeline.from_model_id(
20+
"google/pegasus-xsum", source="huggingface", task="summarization"
21+
)
22+
self.data_generator = CdsDataGenerator()
23+
24+
@hc.ehr(workflow="encounter-discharge")
25+
def load_data_in_client(self) -> CdsFhirData:
26+
data = self.data_generator.generate(
27+
free_text_path="data/discharge_notes.csv", column_name="text"
28+
)
29+
return data
30+
31+
@hc.api
32+
def my_service(self, request: CDSRequest) -> CDSResponse:
33+
result = self.pipeline(request)
34+
return result
35+
36+
37+
if __name__ == "__main__":
38+
summarizer = DischargeNoteSummarizer()
39+
summarizer.start_sandbox()

cookbook/data/discharge_notes.csv

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
text
2+
"Your hospital stay for pneumonia is now complete and you are cleared for discharge home today. During your 5-day admission, you received intravenous antibiotics which have now been changed to oral Co-amoxiclav 625mg three times daily for 5 more days, and you should complete this full course. Take regular Paracetamol 1g four times daily as needed for fever or discomfort, and continue using your regular inhalers as prescribed. You should rest at home for at least 7 days and gradually increase your activity level as you feel able. Use 2-3 pillows when sleeping to help with breathing and try to drink at least 6-8 glasses of water daily. Call your GP or return to hospital immediately if you develop increased shortness of breath, chest pain, fever above 38°C, or coughing up blood. Schedule a follow-up appointment with your GP within 7 days, and attend your chest X-ray appointment scheduled for next Thursday at 2:30 PM to confirm the pneumonia has cleared. The district nurse will visit you at home tomorrow to check your progress and oxygen levels.",
3+
"73-year-old male post CVA ready for discharge to Cedar House Rehabilitation facility tomorrow. Transport booked for 1100hrs - requires bariatric ambulance and 2 crew members (confirmed). Medication reconciliation completed - pharmacy preparing discharge medications (Apixaban 5mg, Baclofen 20mg MR, new anticoagulation card) for collection by daughter at 0900. Patient requires hoisting and pressure-relieving equipment - ward to arrange hospital bed and mattress from equipment library before transfer. Outstanding tasks: 1) Final INR check due 0800 tomorrow, 2) SALT assessment scheduled 0830 tomorrow - must be completed prior to transfer, 3) MAR charts and medication administration record to be faxed to Cedar House before 1000. Social services have confirmed funding for 6-week rehabilitation placement. Daughter (NOK) aware of discharge plan and will bring clothing. Follow-up arrangements needed: Stroke clinic in 4 weeks, SALT outpatient review in 7 days - appointments pending. Current location: Stroke Unit bed 12, side room (previous MRSA). Deep clean required post-discharge. Patient requires NIL BY MOUTH status until SALT assessment completed. Obs stable: BP 135/82, HR 72, afebrile. Ward clerk to notify bed management once patient leaves ward. GP summary to be completed and sent with transfer documentation.",
4+
"Mr. Thompson's discharge from the Stroke Unit to Cedar House Rehabilitation Centre has been approved for tomorrow morning contingent on three requirements: the pharmacy must prepare his modified-release Baclofen 20mg and new anticoagulation medication pack (Apixaban 5mg) for collection by his daughter before 9am, hospital transport must be confirmed for an 11am pickup (bariatric ambulance and two crew members required due to hoisting needs), and the rehabilitation centre must receive his completed medication administration record by fax before accepting admission. The ward needs to arrange collection of his pressure-relieving mattress from the equipment library for transport with him, and his current hospital bed must be deep-cleaned after discharge due to previous MRSA status. Prior to discharge, the Stroke Early Supported Discharge team must complete their initial assessment at 8:30am, and his daughter needs to bring appropriate clothing as hospital gowns cannot be taken to the rehabilitation facility. The patient requires two additional outpatient appointments to be booked: a swallowing assessment with Speech and Language Therapy within 7 days, and a follow-up with the Stroke Consultant in 4 weeks. The social worker must confirm that the family has received the rehabilitation centre's payment schedule and admission documentation. Additionally, the ward must ensure his discharge summary is sent to both his GP and the rehabilitation centre, with a copy of his anticoagulation monitoring booklet and most recent INR results."
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
{
2+
"summary": "Action Required",
3+
"indicator": "info",
4+
"source": {{ default_source | tojson }},
5+
"detail": "{{ model_output }}"
6+
}

docs/api/component.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Component
22

33
::: healthchain.pipeline.components.base
4+
::: healthchain.pipeline.components.integrations
45
::: healthchain.pipeline.components.preprocessors
5-
::: healthchain.pipeline.components.model
66
::: healthchain.pipeline.components.postprocessors
7+
::: healthchain.pipeline.components.cdscardcreator

0 commit comments

Comments
 (0)