In what world do AI scrapers actually pay for data? #24

quat1024 · 2025-06-26T13:48:07Z

quat1024
Jun 26, 2025

AI scrapers are known to:

Routinely DDOS websites.
Use residential IP addresses.
Use misleading User-Agents.
Randomize their User-Agent.
Use these evasive measures more frequently if they believe they are being blocked or served garbage.

Previously, when I was serving the Bee Movie, they just visited a few URLs they found links to somewhere else on the internet, and then they left - for a bit, then came back from residential IPs and unidentifiable user agents.

This is the ecosystem in which projects like anubis and go-away exist. Users of these tools are not necessarily against AI training using content on their website. But in the current ecosystem it is not practical to allow AI scrapers anywhere within ten feet of their website because this is how they behave.

They behave like this because it is cheap. Violating robots.txt is cheap, violating unwritten rules about user-agents is cheap, renting residential IPs is an acceptable cost of doing business. Any and all social barriers to delicious content are ignored; they only don't bother trying to pass anubis challenges because it is economically expensive to do so.

Given this is the ecosystem we are in, what is the economic reason for an AI scraping outfit to give a single shit about:

Direct Contribution: You must provide monetary or in-kind support to the Declaring Party for their development and maintenance of the assets, based on a good faith valuation taking into account your use of the assets and your financial means.

Why bother paying when you already have the data? (Seemingly worked out for Meta!)
Why bother paying when your competitors who don't pay will be at an advantage?
Why bother paying when you can just reroll your IP and User-Agent, and the Declaring Party won't know who you are?

Quoth the report:

[...] However, we see many reasons to
believe that uptake is likely.

For one thing, there is precedent. Although adherence hasn’t always been perfect, robots.txt
functioned for many years as a way to encode normative expectations about—and help
maintain the social contract for—machine reuse of content on the web.

There is precedent of the opposite; AI scrapers deciding robots.txt doesn't apply to them. Compare WIRED, 2024:

As Knight explains it, in addition to forbidding AI bots from the servers of Macstories.net, a site on which he works, by utilizing a robots.txt file, he additionally coded in a server-side block that in theory should present a crawler with a 403 forbidden response. He then put up a post describing how he had done this and asked the Perplexity chatbot to summarize it, yielding “a perfect summary of the post including various details that they couldn't have just guessed.”

They do not care. None of this will make them care.

quat1024 · 2025-06-26T13:52:56Z

quat1024
Jun 26, 2025
Author

Basically this is an impossible problem to solve and no amount of floundering with signals will make them care at all

0 replies

TimidRobot · 2025-06-26T15:45:38Z

TimidRobot
Jun 26, 2025
Maintainer

Links related to preventing AI crawling:

TecharoHQ/anubis: Weighs the soul of incoming HTTP requests to stop AI crawlers
git/go-away: Self-hosted abuse detection and rule enforcement against low-effort mass AI scraping and bots. - GammaSpectra.Live Git
Pay up or stop scraping: Cloudflare program charges bots for each crawl - Ars Technica

2 replies

CJS-Design Jun 26, 2025

That we need solutions comparable to those we use against cyberattacks and other bad actors to defend our work against AI is IMHO a very good symbol for the wrongness of all this.

TimidRobot Jul 1, 2025
Maintainer

(Replaced Cloudflare article with newer one)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

In what world do AI scrapers actually pay for data? #24

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

In what world do AI scrapers actually pay for data? #24

Uh oh!

Uh oh!

quat1024 Jun 26, 2025

Replies: 2 comments · 2 replies

Uh oh!

quat1024 Jun 26, 2025 Author

Uh oh!

Uh oh!

TimidRobot Jun 26, 2025 Maintainer

Uh oh!

CJS-Design Jun 26, 2025

Uh oh!

TimidRobot Jul 1, 2025 Maintainer

quat1024
Jun 26, 2025

Replies: 2 comments 2 replies

quat1024
Jun 26, 2025
Author

TimidRobot
Jun 26, 2025
Maintainer

TimidRobot Jul 1, 2025
Maintainer