We appreciate a star ⭐ at CocoIndex Github if this is helpful.
This example shows how to use BAML to extract structured data from patient intake PDFs. BAML provides type-safe structured data extraction with native PDF support.
- BAML Schema (
baml_src/patient.baml) - Defines the data structure and extraction function - CocoIndex Flow (
main.py) - Wraps BAML in a custom function, provide the flow to and process files incrementally.
-
Install Postgres if you don't have one.
-
Install dependencies
pip install -U cocoindex baml-py
-
Generate BAML client code (required step!)
baml generate
This generates the
baml_client/directory with Python code to call your BAML functions. -
Create a
.envfile. You can copy it from.env.examplefirst:cp .env.example .env
Then edit the file to fill in your
GEMINI_API_KEY.
Update index:
cocoindex update mainI used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with zero pipeline data retention. Run following command to start CocoInsight:
cocoindex server -ci mainThen open the CocoInsight UI at https://cocoindex.io/cocoinsight.