this post was submitted on 23 Mar 2025
1241 points (98.2% liked)
Technology
67987 readers
6543 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
AI scrapers illegally harvesting data are destroying smaller and open source projects. Copyright law is not the only victim
https://thelibre.news/foss-infrastructure-is-under-attack-by-ai-companies/
That article is overblown. People need to configure their websites to be more robust against traffic spikes, news at 11.
Disrespecting robots.txt is bad netiquette, but honestly this sort of gentleman's agreement is always prone to cheating. At the end of the day, when you put something on the net for people to access, you have to assume anyone (or anything) can try to access it.
You think Red Hat & friends are just all bad sysadmins? Source hut maybe...
I think there's a bit of both: poorly optimized/antiquated sites and a gigantic spike in unexpected and persistent bot traffic. The typical mitigations do not work anymore.
Not every site is and not every site should have to be optimized for hundreds of thousands of requests every day or more. Just because they can be doesn't mean that it's worth the time effort or cost.
In this case they just need to publish the code as a torrent. You wouldn't setup a crawler if there was all the data in a torrent swarm.