`texbizct` scraper not working

##  Summary

  The texbizct (Texas Business Court) scraper has been failing since 2026-05-01 and has ingested zero opinions since 2026-05-18.

  - Source: https://www.txcourts.gov/businesscourt/opinions/
  - Last ingested cluster: Plains Pipeline v. Arrowhead Gulf Coast Holdings (created 2026-05-18)

## Root cause

  _get_approximate_date issues a HEAD request per opinion link and parses the Last-Modified header with no status-code check and no None guard:

  resp = await self.request["session"].head(url, follow_redirects=True, timeout=30)
  lm = resp.headers.get("Last-Modified")
  dt = parser.parse(lm)  # ← lm is None → TypeError

  www.txcourts.gov is behind an Azure Front Door WAF. Its 403 block page ("The request is blocked.", x-azure-ref header) carries no Last-Modified header. In prod, the GET of
  the listing page still succeeds (the crash happens later, inside _process_html), but the HEAD requests to the PDFs are being blocked or served without the header. Since
  the exception is raised inside _process_html, the whole scrape dies on the first affected link and nothing is ingested for the run.

  The blocking appears to be escalating: from a residential IP, even the listing GET now returns 403 (with curl and with full browser headers).

  Impact

  - Intermittent failures May 1–18 (some runs survived: clusters created May 4, 8, 11, 13, 14, 15, 18).
  - Zero opinions since 2026-05-18 (~2.5 weeks dark). At the May cadence (~2–3 opinions/week), roughly 5–8 opinions are likely missing.
  - tex and texapp use the same domain but don't do HEAD requests, so they're not hit by this bug — though the WAF escalation could affect them next.

  ┌──────────────┬────────────┬──────────────────────────────────────────────────┐
  │ date_created │ date_filed │                    case_name                     │
  ├──────────────┼────────────┼──────────────────────────────────────────────────┤
  │ 2026-05-18   │ 2026-05-16 │ Plains Pipeline v. Arrowhead Gulf Coast Holdings │
  ├──────────────┼────────────┼──────────────────────────────────────────────────┤


Sentry Issue: [COURTLISTENER-CY2](https://freelawproject.sentry.io/issues/7454292044/?referrer=github_integration)

```
TypeError: Parser must be a string or character stream, not NoneType
(7 additional frame(s) were not displayed)
...
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 514, in handle
    async_to_sync(self.parse_and_scrape_site)(mod, options)
  File "cl/scrapers/management/commands/cl_scrape_opinions.py", line 477, in parse_and_scrape_site
    site = await mod.Site(save_response_fn=save_response).parse()
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`texbizct` scraper not working #1992

Summary

Root cause

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

texbizct scraper not working #1992

Description

Summary

Root cause

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`texbizct` scraper not working #1992