Bruh, when I said “you misunderstand why scrapers use a common user agent” I didn’t require further proof.
Requests following an obvious bulk scraper pattern with user agents that almost certainly aren’t regular humans are trivially easy to handle using decades old techniques, which is why scrapers will not start using curl user agents.
I’m not saying it won’t block some scrapers
See, the thing is with blocking ai scraping, you can actually see it work by looking at the logs. I’m guessing you don’t run any sites that get much traffic or you’d be able to see this too. Its efficacy is obvious.
Sure scrapers could start keeping extra state or brute forcing hashes, but at the scale they’re working at that becomes painfully expensive and the effort required to raise the challenge difficulty is minimal if it becomes apparent that scrapers are getting through. Which will be very obvious if it happens.
once it’s in a training set, all additional protection is just wasted energy.
Presumably you haven’t had much experience with ai scrapers. They’re not a “one run and done” type thing, especially for sites with frequently changing content, like this one.
I don’t want to seem rude, but you appear to be speaking from a position of considerable ignorance, dismissing the work of people who actually have skin in the game and have demonstrated effective techniques for dealing with a problem. Maybe a little more research on the issue would help.
Innocuous-looking paper, vague snake-oil scented: Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents
Conclusions aren’t entirely surprising, observing that LLMs tend to go off the rails over the long term, unrelated to their context window size, which suggests that the much vaunted future of autonomous agents might actually be a bad idea, because LLMs are fundamentally unreliable and only a complete idiot would trust them to do useful work.
What’s slightly more entertaining are the transcripts.
You tell em, Claude. I’m happy for you to send these sorts of messages backed by my credit card. The future looks awesome!