Add a scraper check utility

Currently, we rely on various objects in scraperlib to:
- create the ZIM
- re-encode videos and images
- cache these assets on the optimization cache

We might consider to have a mechanism to perform sanity checks on scraper behavior:
- did we cached all re-encoded images / videos when a cache is present?
- did we removed temporary files from the filesystem as they are added to the ZIM? (we know that while we prefer in-memory/streaming approaches, there are still many scrapers which are using the temporary file approach, and even some situation which have to rely on it)

What I do not yet know:
- should we make the scraper fails if these checks fails?
- is there any chance we automate these checks? (i.e. no need to modify the scrapers, or as little as possible - at least not make a call to "check_i_m_ok" mandatory, because the scraper developers might forget about it as well ; I doubt about this because there are many kind of situations)
- can we do these checks early? (so that we fail the scraper asap instead of wasting time and resources)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a scraper check utility #124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Add a scraper check utility #124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions