Skip to content

Blacklist requests that are duplicates of existing resources or bound to fail #28

Open
@Popolechien

Description

@Popolechien

Following openzim/zimit#113, we should think about implementing a fairly easily editable list (hosted on drive.kiwix.org?) of blacklisted sites that can not be requested on zimit, e.g.

  • kiwix.org subdomains (download and library);
  • very large corporate websites (e.g. Facebook, Twitter, Reddit, Youtube, etc.)
  • websites that have been scraped in the past and failed.

It's probably the matter of a separate ticket, but requests for websites we already have a scraper for (wikipedia, stackoverflow, etc.) should also be soft blocked and the user offered a direct link to the zim file.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions