Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Extract structured data from patient intake forms with BAML

GitHub We appreciate a star ⭐ at CocoIndex Github if this is helpful.

This example shows how to use BAML to extract structured data from patient intake PDFs. BAML provides type-safe structured data extraction with native PDF support.

  • BAML Schema (baml_src/patient.baml) - Defines the data structure and extraction function
  • CocoIndex Flow (main.py) - Wraps BAML in a custom function, provide the flow to and process files incrementally.

Prerequisites

  1. Install Postgres if you don't have one.

  2. Install dependencies

    pip install -U cocoindex baml-py
  3. Generate BAML client code (required step!)

    baml generate

    This generates the baml_client/ directory with Python code to call your BAML functions.

  4. Create a .env file. You can copy it from .env.example first:

    cp .env.example .env

    Then edit the file to fill in your GEMINI_API_KEY.

Run

Update index:

cocoindex update main

CocoInsight

I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with zero pipeline data retention. Run following command to start CocoInsight:

cocoindex server -ci main

Then open the CocoInsight UI at https://cocoindex.io/cocoinsight.