Team: DABB.ai
Members:
- Birajit Saikia
- Devaansh Kathuria
- Abhey Dua
- Bhavya Jain
DABB.ai is a contract risk analysis system that combines a classical clause classifier with a grounded assistant layer for structured legal-style reporting.
Milestone 2 keeps the Milestone 1 machine-learning baseline intact and extends it with:
- Clause-level assistant explanations backed by a local guidance corpus
- PDF report export for submission and demo use
- Multi-contract comparison for spotting repeated risk patterns
- Deployment-ready Streamlit entrypoints for public hosting
The application accepts PDF or TXT contracts and returns:
- Clause segmentation and clause IDs
- Predicted clause type
- Risk severity:
Low,Medium, orHigh - Risk score from
0to100 - Highlighted high-risk clauses
- CSV and JSON exports
- Structured legal assistance report
- Optional PDF report download
- Multi-contract comparison summary
This tool is informational only and is not legal advice. Always consult a qualified legal professional before making legal decisions.
flowchart TD
A[Upload PDF/TXT] --> B[Text Extraction]
B --> C[Preprocess + Clause Segmentation]
C --> D[TF-IDF Vectorization]
D --> E[Clause Classifier]
E --> F[Risk Mapping Table]
F --> G[Severity + Risk Score]
G --> H[Streamlit UI]
H --> I[Assistant Report + Clause Drill-down]
I --> J[PDF Export + Multi-Contract Comparison]
J --> K[CSV / JSON Export]
L[Bundled Legal Guidance Corpus] --> M[Local Retrieval Index]
M --> I
N[Training CSV] --> O[Train / Refresh Model]
O --> P[models/model.joblib]
P --> E
DABB.ai/
├── app.py
├── streamlit_app.py
├── data/
├── docs/
├── models/
├── reports/
├── scripts/
├── src/contract_risk/
└── tests/
- Python 3.10+
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtstreamlit run streamlit_app.pystreamlit_app.py is the recommended hosted entrypoint. app.py remains the reusable application module for local development and tests.
PYTHONPATH=src python -m contract_risk.cli train --csv data/raw/legal_docs_modified.csvIf data/raw/legal_docs_modified.csv is unavailable, the app falls back to data/demo/sample_training.csv.
PYTHONPATH=src python -m contract_risk.cli eval --csv data/raw/legal_docs_modified.csv --reports-dir reports- Upload a PDF or TXT contract, or enable the bundled demo contract.
- Review the clause table, severity filters, and highlighted sections.
- Generate the legal assistance report for clause explanations and mitigation guidance.
- Export CSV, JSON, or PDF artifacts as needed.
- Upload multiple contracts to compare repeated risk patterns.
- Create a new app and point the main file at
streamlit_app.py. - Keep
requirements.txtat the repository root. - Use the default public deployment settings from
.streamlit/config.toml.
- Create a new Space with the
StreamlitSDK. - Set the app entrypoint to
streamlit_app.py. - Push this repository and allow the Space to install dependencies from
requirements.txt.
- Use the included
render.yamlorProcfile. - Deploy the repository as a Python web service.
- Start the app with
streamlit run streamlit_app.py.
The app supports the following optional environment variables:
DABB_MODEL_PATHDABB_REPORTS_DIRDABB_TRAINING_CSVDABB_FALLBACK_TRAINING_CSV
python3 -m compileall app.py streamlit_app.py src tests
python3 -m pytest -qThe repository also includes GitHub Actions CI under .github/workflows/ci.yml.
- Clause labels still depend on training data quality and class balance.
- Risk scores are rule-based and should be treated as screening guidance.
- Scanned or image-only PDFs may not extract clean text.
- The assistant report is grounded in the bundled corpus, but it is still informational only.
- Multi-contract comparison surfaces repeated patterns rather than making legal judgments.
reports/contains the milestone report artifacts and presentation material.docs/deployment.mddocuments the public-host startup path and smoke checks.streamlit_app.pyis the submission-ready public entrypoint.
PYTHONPATH=src python -m contract_risk.cli trainPYTHONPATH=src python -m contract_risk.cli evalstreamlit run streamlit_app.pypython3 -m pytest -q