Skip to content

TX scraper intermittent problems #767

@stucka

Description

@stucka

Texas appears to have implemented Cloudflare, which most of the time is killing the scraper.

Periodically, however, the scraper starts working again; it's happened twice. I'm backburnering this to work on less-dumb things.

With modifications to a library, I've got a Zyte implementation that can get me the main file to scrape -- but not the Excel files I need. Possibly the Excel files can be cured with an added Referrer tag for the headers, but I don't know.

Most likely this would require a real browser implementation, possibly using Zyte as a HTTP proxy. (And we don't have existing proxy code.) Existing implementation of a real browser approach is Virginia.

If this thing fixes itself twice a week, I'm inclined to worry about other stuff more.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions