Context
Eric flagged that faceted search pages are heavily crawled by bots, tying up server resources. The combinatorial URL space of facets creates a near-infinite crawl surface that bots love to explore.
Problem
Faceted search generates many URL permutations (subject × language × format × ...) that:
- Bots crawl exhaustively, consuming server CPU and DB queries
- robots.txt can't practically block all combinations
- Each facet page triggers DB queries that are expensive at scale
Proposed approach
- Identify which faceted search views/URLs exist
- Evaluate whether any real users depend on them
- Remove or simplify to reduce the crawl surface
- Consider replacing with a simpler search that doesn't generate combinatorial URLs
Related
Context
Eric flagged that faceted search pages are heavily crawled by bots, tying up server resources. The combinatorial URL space of facets creates a near-infinite crawl surface that bots love to explore.
Problem
Faceted search generates many URL permutations (subject × language × format × ...) that:
Proposed approach
Related