Skip to content

Relax AI scrape policy#510

Merged
SuperQ merged 1 commit into
masterfrom
superq/ai_permissive
May 6, 2026
Merged

Relax AI scrape policy#510
SuperQ merged 1 commit into
masterfrom
superq/ai_permissive

Conversation

@SuperQ

@SuperQ SuperQ commented May 6, 2026

Copy link
Copy Markdown
Collaborator

Switch to the permissive scrape policy to allow more well behaved AI crawlers.

Switch to the permissive scrape policy to allow more well behaved AI
crawlers.

Signed-off-by: SuperQ <superq@gmail.com>
@SuperQ SuperQ requested review from nthmost and patrickod May 6, 2026 02:25
@SuperQ

SuperQ commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

This should implement similar to #451.

@nthmost

nthmost commented May 6, 2026

Copy link
Copy Markdown
Member

Do we need to make some updates to robots.txt with this?

I'm admittedly getting a little head-swirly with the number of layers of bot detection here.

right now robots.txt only blocks ClaudeBot and Amazonbot, but not GPTBot, Google-Extended, CCBot... etc

@SuperQ

SuperQ commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Currently I only get this:

$ curl https://www.noisebridge.net/robots.txt
User-agent: *
Disallow: /wiki/86
Disallow: /86
Disallow: /index.php?page=86
Noindex: /wiki/86
Noindex: /86
Noindex: /index.php?page=86

@nthmost

nthmost commented May 6, 2026

Copy link
Copy Markdown
Member

Currently I only get this:

oh i see -- there's an extant playbook for robots_txt that hasn't been run, maybe? from April 2024

(Sorry, still finding a lot of this confusing and didn't actually look at live robots.txt first)

@SuperQ

SuperQ commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Yea, probably just never deployed.

@nthmost nthmost left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we clean up the robots_txt tho? (separately)

@SuperQ

SuperQ commented May 6, 2026

Copy link
Copy Markdown
Collaborator Author

Yes, we should probably do a round of cleanup of the robots.txt and actually deploy it this time. :)

@SuperQ SuperQ merged commit b5b7dfc into master May 6, 2026
1 check passed
@SuperQ SuperQ deleted the superq/ai_permissive branch May 6, 2026 12:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants