Automated multi-agent workflow for filling out forms: extract data from documents using OCR, map it to form fields, and generate filled PDFs. The pipeline uses a W-9 tax form as an example, which can be extended to accommodate other forms.
We use:
- CrewAI (agentic design) for multi-agent orchestration
- Datalab (document conversion & form filling) for OCR and form filling
- Streamlit for an interactive UI
- MiniMax-M2.1 (via OpenRouter) as the LLM for the agents
Follow these steps one by one:
Create a .env file in the root directory of your project with the following content:
OPENROUTER_API_KEY=<your_openrouter_api_key>
DATALAB_API_KEY=<your_datalab_api_key>uv sync
source .venv/bin/activateOn Windows (PowerShell):
uv sync
.venv\Scripts\activateThis installs all required dependencies (CrewAI, Datalab SDK, Streamlit, etc.).
To run the form-filling workflow from the command line (e.g. with the bundled W-9 example):
python main.pyYou can also use the workflow programmatically via run_form_flow() in main.py, passing paths to your source document, blank form PDF, and form schema (YAML).
To run the Streamlit interface:
streamlit run app.pyThis starts the web UI where you can upload documents, choose a form schema, and run the pipeline. Use the URL shown in the terminal (e.g. http://localhost:8501) to open the app in your browser.
Get a FREE Data Science eBook 📖 with 150+ essential lessons in Data Science when you subscribe to our newsletter! Stay in the loop with the latest tutorials, insights, and exclusive resources. Subscribe now!
Contributions are welcome! Feel free to fork this repository and submit pull requests with your improvements.
