scripts/

Optional helpers. The kit is instruction-only at its core; nothing in scripts/ is required to use it from an LLM. These tools exist for two purposes:

Generic, instruction-only helpers that anyone can run against any tracker CSV. (validate_tracker.py, deadline_watch.py.) No third-party dependencies, no API keys.
A local-ops pipeline for the workstation use case where the patient drops mail into an inbox/ folder and the scripts route, OCR, match, and draft dispute letters automatically. This pipeline uses Azure OpenAI for vision OCR and field extraction. It expects a specific folder layout under a personal Health_Bills/ directory.

Generic helpers

`validate_tracker.py`

Validates a tracker CSV against the TOML schemas in ../schemas/. Returns exit code 0 if the file conforms, 1 if it has structural problems, 2 on usage errors. Checks header row, ISO 8601 dates, decimal-parseable amounts, boolean conformance, enum membership, and semicolon-separated findings vocabulary.

python scripts/validate_tracker.py my_tracker_2026-05-18.csv

`deadline_watch.py`

Reports overdue and upcoming bill actions from a tracker CSV. Groups bills into Overdue / Due soon / Upcoming based on next_action_due. Exits 1 if anything is overdue. Useful in Task Scheduler / cron for weekly check-ins.

With --sol --state <CODE>, the script also reports accounts whose written-contract statute of limitations has expired or expires soon. SOL data comes from references/sol_by_state.md and the bundled state table in this script (keep both in sync). The SOL group is informational; the patient should not act on a near-SOL flag without first researching the state's re-aging rule (acknowledging the debt or paying restarts the clock in most states).

python scripts/deadline_watch.py my_tracker_2026-05-18.csv
python scripts/deadline_watch.py my_tracker_2026-05-18.csv --window 14
python scripts/deadline_watch.py my_tracker_2026-05-18.csv --as-of 2026-06-01
python scripts/deadline_watch.py my_tracker_2026-05-18.csv --sol --state TN
python scripts/deadline_watch.py my_tracker_2026-05-18.csv --sol --state FL --sol-facility-rule

Local-ops pipeline (workstation-specific)

The scripts below form a state-machine pipeline that takes scanned medical-bill mail, OCRs it, organizes it into a per-biller folder structure, links bills to EOBs, and drafts the right next letter based on the evidence we currently have.

The key design rule: never draft a substantive dispute letter until we have BOTH the bill's EOB from the insurer AND the bill's itemized statement from the provider. If either is missing, the script drafts a request letter to obtain it instead. This prevents premature disputes that the biller can dismiss for lack of evidence.

The full chain, in order:

classify_rename_medical_bills.py    intake — split mixed inbox/ into Billers/ and EOB/
text-extraction step                 (out of scope for this kit — use ai-toolkit's file_management
                                      Stage 5 `extract_documents.py` or any equivalent that produces
                                      `<file>.extracted.txt` sidecars next to each source file)
restructure_to_billers_eob.py       one-time migration if older `providers/` layout exists
index_bills_and_claims.py           per-folder _bills.csv and _claims.csv via Azure
match_claims_to_bills.py            link each EOB claim to a bill it adjudicates
fetch_price_benchmarks.py           per-folder _benchmarks.csv vs Medicare PFS rates
audit_billing_errors.py             per-folder _audit.csv flagging duplicates, NCCI unbundling, late fees
check_completeness.py               derive per-bill has_eob / has_itemization / benchmarks gates, cluster encounters
draft_letters_by_state.py           draft the next letter for each dispute group, with encounter context
log_interaction.py                  append a phone call / mailing / response to the action log
bundle_evidence.py                  zip the full artifact set per dispute group for offsite backup

`restructure_to_billers_eob.py`

One-time migration from the older providers/<biller_slug>/ layout (which mixed bills and EOBs in one folder per biller) into the two-track layout Billers/<biller_slug>/ + EOB/<biller_slug>/. Detects EOBs by the explicit "Explanation of Benefits" text marker (not by "THIS IS NOT A BILL", which appears on hospital itemizations as well). Old DISPUTE_LETTER.md drafts get archived to _archive_old_letters/ so the new state-machine drafts fresh ones.

python scripts/restructure_to_billers_eob.py --dry-run
python scripts/restructure_to_billers_eob.py

`index_bills_and_claims.py`

Reads every <file>.extracted.txt sidecar produced by the text extractor and uses Azure OpenAI gpt-5.2 (text-only, no image render) to extract structured fields. Writes _bills.csv per Billers/<slug>/ (one row per bill PDF) and _claims.csv per EOB/<slug>/ (one row per CLAIM line, a multi-claim EOB produces N rows). Idempotent: each sidecar's content hash is recorded in its row, so re-runs only call Azure for new or changed files.

Computes the has_itemization flag using the peer-reviewed heuristic:

Count of distinct dated charge lines (a line with both a service date and a charge amount), threshold ≥ 3.
Override-to-false when payment-ledger keywords dominate ("payment received", "balance forward", "contractual adjustment") or when EOB-style fields are prominent (claim_number / allowed / coinsurance / deductible).
Override-to-true when UB-04 form headers (revenue code, service units, total charges) or CMS-1500 form headers (place of service, CPT/HCPCS, modifiers, days/units) are present.

The heuristic is conservative: false positives (claiming itemized when it isn't) are worse than false negatives, because the next step is to mail a dispute letter assuming the evidence is in hand.

python scripts/index_bills_and_claims.py
python scripts/index_bills_and_claims.py --force   # re-extract every file

`match_claims_to_bills.py`

Links each EOB claim row to the bill that adjudicates it. Two-stage:

Deterministic (no API call): same biller_slug + amount within $0.50 + DOS overlap (or claim DOS within 60 days of the bill's statement date if no bill DOS).
Azure OpenAI fallback when deterministic returns multiple candidates or none: gpt-5.2 sees the claim row and the candidate bills with a strict "respond UNKNOWN if not confident" prompt. False positives are worse than false negatives.

Output: <log-dir>/matches.csv with one row per attempted match. Match types: deterministic, azure, azure_unknown, unmatched, bill_only (no claim for this slug), claim_only (no bill for this slug). The log directory defaults to ~/.medbill-dispute-kit/tracker/; override via $HEALTHBILLS_LOG_DIR.

python scripts/match_claims_to_bills.py

`audit_billing_errors.py`

Scans each bill sidecar for the common billing errors Marshall Allen catalogues in "Never Pay the First Bill" and produces Billers/<slug>/_audit.csv with one row per finding. Detected categories:

Duplicate CPT same bill, same code billed two or more times with a positive charge on the same bill.
NCCI unbundling, comprehensive code billed alongside an included sub-code (e.g., CMP 80053 alongside BMP 80048). Pairs are loaded from references/ncci_pairs_common.csv (~70 common pairs ship; extensible without touching the script).
Modifier-25 stacking, modifier 25 keyword present and at least one E/M code billed alongside another procedure same DOS.
Late fees / finance charges, keyword detection on the sidecar text; most states cap or prohibit these on medical debt.
Service-not-received hints, "no-show", "cancelled", "left AMA", "refused" language in the sidecar; prompt to obtain the medical record under templates/letter_records_request_hipaa.md.
Quantity inflation, line items with units or qty >= 10 are flagged for chart cross-check.

The audit script makes no network calls. The dispute drafter pulls audit findings into its prompt context so substantive letters cite the structured findings rather than re-extracting them from the sidecar.

python scripts/audit_billing_errors.py
python scripts/audit_billing_errors.py --slug a_specific_biller

`fetch_price_benchmarks.py`

Walks every Billers/<slug>/_bills.csv and extracts each bill's CPT/HCPCS codes plus the dollar amount appearing next to them in the sidecar text. Joins each code against references/medicare_pfs_common.csv (a curated CY2025 national-rate lookup that ships with the kit) and writes Billers/<slug>/_benchmarks.csv with the ratio of billed to Medicare allowable. Also emits a FAIR Health Consumer URL and a Healthcare Bluebook URL per code so the patient can look up commercial fair-market ranges manually if they want a second benchmark.

This script makes no network calls. The Medicare lookup is bundled public-domain data. Codes not in the bundled file appear in the output with blank Medicare data and a ratio of "", the patient can extend references/medicare_pfs_common.csv over time as new codes show up in their bills.

Marshall Allen's UCC § 2-305 "open price term" argument needs evidence of fair market value. Medicare allowable is the most defensible benchmark a patient can cite back to a provider. This script produces that evidence as a structured artifact that check_completeness.py reads to gate the negotiated-counter-offer track and draft_letters_by_state.py reads to render the line-item table inside the counter-offer letter.

python scripts/fetch_price_benchmarks.py
python scripts/fetch_price_benchmarks.py --slug a_specific_biller

`check_completeness.py`

Joins the per-folder CSVs with matches.csv and writes the master tracker.csv to the log directory (default ~/.medbill-dispute-kit/tracker/, override via $HEALTHBILLS_LOG_DIR). Each bill row carries:

has_eob (Y/N, derived from matches.csv)
has_itemization (Y/N, from _bills.csv)
benchmarks_available (Y/N, derived from _benchmarks.csv, Y means at least one CPT is billed at ≥ 150% of the Medicare allowable, which gates the counter-offer track)
status (gathering_evidence | ready_to_dispute | disputed | escalated | settled | closed | superseded)
next_action (request_eob | request_itemization | negotiate_counter_offer | draft_dispute | file_doi_complaint | file_small_claims | etc.)

Manual columns the user fills in after mailing each letter (eob_request_sent_date, eob_request_tracking, counter_offer_sent_date, doi_complaint_sent_date, small_claims_filed_date, etc.) are preserved across runs, the script never overwrites a value the user has entered.

python scripts/check_completeness.py

`draft_letters_by_state.py`

For each dispute group (bills with the same biller_slug + account_number) selects the canonical bill (latest statement_date) and drafts whichever letter the state machine wants:

has_eob = N and eob_request_sent_date empty → draft LETTER_REQUEST_EOB.md
has_itemization = N and itemization_request_sent_date empty → draft LETTER_REQUEST_ITEMIZATION.md
All three gates green (EOB + itemization + benchmark-overpriced) and no main letter sent → draft LETTER_COUNTER_OFFER.md using templates/letter_negotiation_counter_offer.md
Both evidence gates green but no overpriced line items, and no dispute letter sent → draft DISPUTE_LETTER.md using the appropriate kit template (NSA, FDCPA, dental dispute, ERISA appeal, initial dispute)
Any main letter drafted or sent and no DOI complaint yet → draft COMPLAINT_DOI.md for parallel pressure
30-day warning sent and no small-claims filing yet → draft SMALL_CLAIMS_CIVIL_WARRANT.md

Output files land in Billers/<slug>/<bill_id>_LETTER_*.md (or _COMPLAINT_DOI.md, _SMALL_CLAIMS_CIVIL_WARRANT.md). The path is recorded back into tracker.csv columns drafted_eob_request, drafted_itemization_request, drafted_dispute_letter, drafted_counter_offer, drafted_doi_complaint, drafted_small_claims_civil_warrant.

The counter-offer letter auto-computes counter_offer_amount as 200% of the sum of Medicare allowables for the bill's CPT codes (with a fallback to 20% of current_balance when no codes have Medicare data on file). The user can override counter_offer_amount directly in tracker.csv and re-run with --force to use a different anchor.

Per-folder overrides exist for cases where the dispute template is known regardless of OCR signals (e.g., quantum_radiology → NSA, humana → dental_dispute, labcorp → FDCPA). The model fills placeholders from real bill/EOB content; if a field isn't visible in evidence, the model is instructed to leave the placeholder rather than invent a value.

Encounter context: when check_completeness.py clusters multiple bills into the same encounter_id (e.g., a hospital + ER physician + radiology + anesthesia all on the same DOS), the drafter passes a sibling-summary block to the LLM. The model then references the full encounter when applying NSA ancillary-provider protection across providers, rather than treating each bill in isolation.

Additional template keys available for FOLDER_TEMPLATE_OVERRIDES:

records_request_hipaa, templates/letter_records_request_hipaa.md
good_faith_estimate_request, templates/letter_good_faith_estimate_request.md
ppdr_initiate, templates/letter_ppdr_initiate.md
challenge_hospital_lien, templates/letter_challenge_hospital_lien.md
subrogation_response, templates/letter_subrogation_response.md
credit_report_dispute_fcra, templates/letter_credit_report_dispute_fcra.md
request_insurer_initiate_idr, templates/letter_request_insurer_initiate_idr.md
dispute_reply, templates/letter_dispute_reply.md (second written dispute when the first reply did not address the substance)
erisa_502c_penalty, templates/letter_erisa_502c_penalty.md (statutory penalty for plan-document non-production)

These do not have automatic state-machine gates because they are user-initiated (records review, GFE/PPDR for self-pay patients, accident-related lien and subrogation, credit-reporting and IDR escalations, second written dispute, ERISA penalty claim). Set them in FOLDER_TEMPLATE_OVERRIDES to drive the drafter when the trigger condition applies for a specific biller.

Automatic state-machine branches added in v0.13.0:

WC / auto-medpay routing, the canonical bill's sidecar text is keyword-scanned for work-related-injury or motor-vehicle-accident markers; matching bills get LETTER_WC_CARRIER_REDIRECT.md or LETTER_AUTO_MED_PAY.md drafted alongside (not instead of) the regular dispute flow.
Encounter-combined dispute, encounters with 4+ distinct billers (a hospital-admission signature) and at least one EOB on file produce a single LETTER_ENCOUNTER_COMBINED.md anchored to the alphabetically-first bill_id in the encounter, addressing every provider in the encounter at once.

python scripts/draft_letters_by_state.py
python scripts/draft_letters_by_state.py --dry-run
python scripts/draft_letters_by_state.py --force

`log_interaction.py`

Append one row to <log-dir>/actions.csv (default ~/.medbill-dispute-kit/tracker/actions.csv). Marshall Allen's discipline: every phone call, every email, every in-person encounter gets logged. The action log is the paper trail that turns a dispute into evidence.

Rows are append-only and follow schemas/action.toml. Action IDs auto-increment as A-YYYY-NNN. The script refuses to log against an unknown bill_id so you don't silently miss an entry by typoing the ID.

python scripts/log_interaction.py \
    --bill-id B-abc1234567 \
    --action phone_call_to_billing \
    --recipient "Acme Hospital billing dept" \
    --note "Spoke with Jane (rep ID 4421); promised callback by Friday"

python scripts/log_interaction.py \
    --bill-id B-abc1234567 \
    --action records_request_sent \
    --recipient "Acme Hospital HIM dept" \
    --tracking 9405511899223345678901 \
    --template templates/letter_records_request_hipaa.md \
    --response-due 2026-06-20

Phone-call protocols and rep-side scripts live in references/phone_call_scripts.md. The kit is mail-first by default; the phone scripts are for users who choose to call.

`bundle_evidence.py`

Zip the complete artifact set per dispute group into <HEALTHBILLS_ROOT>/_bundles/<bill_id>_<YYYYMMDD>.zip for offsite backup or court-exhibit packaging. Each bundle contains the original bill PDF and sidecar, the matched EOB PDF(s) and sidecars, every drafted letter for the group, the benchmark and audit rows for the bill, the action log entries for the bill, the tracker rows for the group (canonical + superseded), and a MANIFEST.md describing what's in the bundle and what's missing.

python scripts/bundle_evidence.py                       # bundle every group
python scripts/bundle_evidence.py --bill-id B-abc12345
python scripts/bundle_evidence.py --slug a_specific_biller

Bundles are timestamped per run and earlier bundles are never deleted, so re-bundling is non-destructive.

`bundle_to_cloud.py`

Pushes evidence bundles to an encrypted offsite destination via rclone. The kit does not bundle credentials or assume a backend; configure rclone once with rclone config (Backblaze B2, Wasabi, S3, etc., all work; pair with rclone's crypt backend for client-side encryption), then set HEALTHBILLS_CLOUD_REMOTE and HEALTHBILLS_CLOUD_PATH env vars and run:

python scripts/bundle_to_cloud.py
python scripts/bundle_to_cloud.py --bundle B-abc12345_20260521.zip
python scripts/bundle_to_cloud.py --since 2026-05-01
python scripts/bundle_to_cloud.py --dry-run

Uses rclone's --immutable so remote files are never overwritten. The script does not delete local copies; use rclone's own retention features or a separate cleanup script if you want pruning.

`fetch_mrf.py`

Pulls a hospital's machine-readable price file (45 CFR Part 180) and extracts per-CPT gross, cash, min/max negotiated, and per-payer rates for the codes on a specific bill. Standard-library-only; no network credentials needed (most hospital MRFs are public URLs). Content-sniffs the format from a small read of the file's first bytes:

CMS template JSON (post-July 2024), top-level standard_charge_information[].
CMS template CSV, wide-format with standard_charge|gross, standard_charge|<payer>|<plan>|negotiated_dollar columns.
Turquoise / TransparentRx flat CSV, cpt_hcpcs_code + gross_charge + negotiated_rate_*.
TransparentRx legacy JSON, nested PriceTransparency.Items[].
Epic-native wide CSV, BILLABLE_CODE + CODE_TYPE + LIST_PRICE.

See references/mrf_vendor_adapters.md for format details and where to find a hospital's MRF URL.

python scripts/fetch_mrf.py \
    --url https://example.com/standard-charges.json \
    --hospital-slug example_general \
    --cpts 99284,99285,71046,80053

`parse_spd.py`

Reads a Summary Plan Description PDF via Azure OpenAI gpt-5.2 and emits a structured plan profile JSON for use by the ERISA appeal, subrogation response, IDR request, and 502(c) penalty templates. The profile includes funding status (self-funded vs fully insured), in-network cost-sharing, claim and appeal deadlines, subrogation language (with made-whole and common-fund disclaimer flags), and the plan's NSA-ancillary implementation. See references/spd_parsing_guide.md for the field set and use cases.

python scripts/parse_spd.py --pdf path/to/spd.pdf --plan-slug acme_ppo_2026
python scripts/parse_spd.py --pdf path/to/spd.pdf --plan-slug acme_ppo_2026 --max-pages 80

Output: <HEALTHBILLS_ROOT>/_spd_profiles/<plan_slug>.json.

`classify_rename_medical_bills.py`

Intake stage. Walks inbox/, calls Azure OpenAI vision on each file, renames per the file_management v1.1 convention <contents_summary>_<category>_<YYYY>_<MM>_v<N>.<ext>, splits multi-bill PDFs by page range, and routes each output to:

Billers/<biller_slug>/ for bills, itemizations, collection notices
EOB/<biller_slug>/ for Explanation of Benefits documents
other/ for financial/personal non-medical documents

Provider-alias map handles the common biller-name variants (TriStar/Southern Hills/HCA, Labcorp/Laboratory Corporation, Premier Radiology variants, etc.). Re-running on the same inbox/ is safe; files already routed are not re-processed.

python scripts/classify_rename_medical_bills.py
python scripts/classify_rename_medical_bills.py --dry-run

Workstation configuration (`kit_config.toml`)

Three pipeline overrides, which biller slugs to always-close as correspondence-only, which slugs to route to a specific dispute template, and which slugs to load an additional state-pack for, load from a TOML config file outside the kit's source tree. Default location: <HEALTHBILLS_ROOT>/kit_config.toml. Override path via the MEDBILL_KIT_CONFIG_FILE env var. Missing file is fine; the kit ships with empty defaults.

Schema:

[always_skip_slugs]
# Slugs whose folders should always derive status = closed rather
# than triggering dispute actions. Used for insurer / agency /
# coverage correspondence that is not a billable provider claim.
slugs = ["my_insurer_correspondence_slug", "my_state_medicaid_slug"]

[folder_template_overrides]
# Map biller_slug -> dispute template key (from TEMPLATE_PATHS in
# draft_letters_by_state.py). Drives the dispute-letter branch when
# both gates are open. Valid keys: itemization, initial_dispute,
# no_surprises, fdcpa, erisa_appeal, dental_dispute, counter_offer,
# request_eob, records_request_hipaa, good_faith_estimate_request,
# ppdr_initiate, challenge_hospital_lien, subrogation_response,
# credit_report_dispute_fcra, request_insurer_initiate_idr,
# auto_med_pay, wc_carrier_redirect, dispute_reply,
# erisa_502c_penalty, encounter_combined.
my_dental_insurer_slug = "dental_dispute"
my_collector_slug      = "fdcpa"

[biller_state_overrides]
# Map biller_slug -> two-letter US state code (lowercase). Used when
# services were rendered in a different state from the patient's
# residence; the drafter loads the additional state pack so letters
# can cite that state's statutes.
out_of_state_hospital_slug = "ga"

The config file is per-workstation and must never be committed to any repo. The public kit ships with empty defaults so the pipeline runs without modification on a fresh checkout.

Privacy notes

The local-ops scripts upload bill / EOB images and extracted text to Azure OpenAI. They also write index CSVs and the master tracker to your log directory (default ~/.medbill-dispute-kit/tracker/, override via $HEALTHBILLS_LOG_DIR) containing patient name, provider name, claim numbers, dates of service, and dollar amounts. Treat that directory as sensitive: keep it on local disk (not synced to multi-user storage), and back up encrypted.

The Azure deployment reads credentials from a workstation .env file. The default location is ~/.medbill-dispute-kit/.env; override via $MEDBILL_KIT_ENV_FILE. The .env file must contain AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, and AZURE_OPENAI_DEPLOYMENT. Do not commit this file to any repo.

Same rule for kit_config.toml (see the "Workstation configuration" section above): per-workstation, never committed.

Requirements

Python 3.11+
PyMuPDF (fitz) for PDF rendering, used by classify_rename_medical_bills.py
openai for the Azure-compatible client
Azure OpenAI deployment with vision support (the workstation default uses gpt-5.2)
Workstation .env with AZURE_OPENAI_API_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_DEPLOYMENT
Tesseract OCR (optional, on PATH), the text extractor falls back to vision OCR for image-only PDFs, so Tesseract is not required by these scripts

validate_tracker.py and deadline_watch.py have no third-party dependencies and need no API keys.

When to extend

If you have a new biller pattern that the alias map doesn't cover, edit BILLER_ALIASES near the top of classify_rename_medical_bills.py. If you have a new dispute scenario (e.g., a new state's medical-debt protection law), add a template under ../templates/ and a corresponding entry in the DISPUTE_TEMPLATE_PICKER or FOLDER_TEMPLATE_OVERRIDES map in draft_letters_by_state.py.

The kit's roadmap (../roadmap.json) is the source of truth for what's planned next. Open an issue or PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts/

Generic helpers

`validate_tracker.py`

`deadline_watch.py`

Local-ops pipeline (workstation-specific)

`restructure_to_billers_eob.py`

`index_bills_and_claims.py`

`match_claims_to_bills.py`

`audit_billing_errors.py`

`fetch_price_benchmarks.py`

`check_completeness.py`

`draft_letters_by_state.py`

`log_interaction.py`

`bundle_evidence.py`

`bundle_to_cloud.py`

`fetch_mrf.py`

`parse_spd.py`

`classify_rename_medical_bills.py`

Workstation configuration (`kit_config.toml`)

Privacy notes

Requirements

When to extend

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

scripts/

Generic helpers

validate_tracker.py

deadline_watch.py

Local-ops pipeline (workstation-specific)

restructure_to_billers_eob.py

index_bills_and_claims.py

match_claims_to_bills.py

audit_billing_errors.py

fetch_price_benchmarks.py

check_completeness.py

draft_letters_by_state.py

log_interaction.py

bundle_evidence.py

bundle_to_cloud.py

fetch_mrf.py

parse_spd.py

classify_rename_medical_bills.py

Workstation configuration (kit_config.toml)

Privacy notes

Requirements

When to extend

`validate_tracker.py`

`deadline_watch.py`

`restructure_to_billers_eob.py`

`index_bills_and_claims.py`

`match_claims_to_bills.py`

`audit_billing_errors.py`

`fetch_price_benchmarks.py`

`check_completeness.py`

`draft_letters_by_state.py`

`log_interaction.py`

`bundle_evidence.py`

`bundle_to_cloud.py`

`fetch_mrf.py`

`parse_spd.py`

`classify_rename_medical_bills.py`

Workstation configuration (`kit_config.toml`)