TX scraper intermittent problems

Texas appears to have implemented Cloudflare, which most of the time is killing the scraper.

Periodically, however, the scraper starts working again; it's happened twice. I'm backburnering this to work on less-dumb things.

With modifications to a library, I've got a Zyte implementation that can get me the main file to scrape -- but not the Excel files I need. Possibly the Excel files can be cured with an added Referrer tag for the headers, but I don't know.

Most likely this would require a real browser implementation, possibly using Zyte as a HTTP proxy. (And we don't have existing proxy code.) Existing implementation of a real browser approach is Virginia.

If this thing fixes itself twice a week, I'm inclined to worry about other stuff more.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TX scraper intermittent problems #767

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TX scraper intermittent problems #767

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions