14
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 15 Jun 2025
14 points (100.0% liked)
TechTakes
2044 readers
78 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
New article from Axos: Publishers facing existential threat from AI, Cloudflare CEO says
Baldur Bjarnason has given his commentary:
Anyways, personal sidenote/prediction: I suspect the Internet Archive's gonna have a much harder time archiving blogs/websites going forward.
Up until this point, the Archive enjoyed easy access to large swathes of the 'Net - site owners had no real incentive to block new crawlers by default, but the prospect of getting onto search results gave them a strong incentive to actively welcome search engine robots, safe in the knowledge that they'd respect robots.txt and keep their server load to a minimum.
Thanks to the AI bubble and the AI crawlers its unleashed upon the 'Net, that has changed significantly.
Now, allowing crawlers by default risks AI scraper bots descending upon your website and stealing everything that isn't nailed down, overloading your servers and attacking FOSS work in the process. And you can forget about reigning them in with robots.txt - they'll just ignore it and steal anyways, they'll lie about who they are, they'll spam new scrapers when you block the old ones, they'll threaten to exclude you from search results, they'll try every dirty trick they can because these fucks feel entitled to steal your work and fundamentally do not respect you as a person.
Add in the fact that the main upside of allowing crawlers (turning up in search results) has been completely undermined by those very same AI corps, as "AI summaries" (like Google's) steal your traffic through stealing your work, and blocking all robots by default becomes the rational decision to make.
This all kinda goes without saying, but this change in Internet culture all-but guarantees the Archive gets caught in the crossfire, crippling its efforts to preserve the web as site owners and bloggers alike treat any and all scrapers as guilty (of AI fuckery) until proven innocent, and the web becomes less open as a whole as people protect themselves from the AI robber barons.
On a wider front, I expect this will cripple any future attempts at making new search engines, too. In addition to AI making it piss-easy to spam search systems with SEO slop, any new start-ups in web search will struggle with quality websites blocking their crawlers by default, whilst slop and garbage will actively welcome their crawlers, leading to your search results inevitably being dogshit and nobody wanting to use your search engine.
FWIW, due to recent developments, I've found myself increasingly turning to non-search engine sources for reliable web links, such as Wikipedia source lists, blog posts, podcast notes or even Reddit. This almost feels like a return to the early days of the internet, just in reverse and - sadly - with little hope for improvement in the future.
Searching Reddit has really become standard practice for me, a testament to how inhuman the web as a whole has gotten. What a shame.
Sucks that a lot of reddit is also being botted. But yes reddit still good. Still fucked that bots take a redit post as input, rewrite it into llm garbage and those then get a high google ranking, while google only lists one or two reddit pages.