37
submitted 2 months ago by [email protected] to c/[email protected]
top 10 comments
sorted by: hot top new old
[-] [email protected] 4 points 2 months ago

about 50% of traffic to programming.dev is bots who have marked their user-agents as such. I'm pretty confident the actual number is higher, but haven't spent time validating.

[-] [email protected] 3 points 2 months ago
[-] [email protected] 4 points 2 months ago

Snowe is sysadmin of programming.dev...

So source: Snowe

[-] [email protected] 2 points 2 months ago

Oh thanks lol

[-] [email protected] 2 points 2 months ago

while others could be executing real-time searches when users ask AI assistants for information.

WTF? Is this even considered ai anymore? Sounds more like a Just-In-Time search engine.

The frequency of these crawls is particularly telling. Schubert observed that AI crawlers "don't just crawl a page once and then move on. Oh, no, they come back every 6 hours because lol why not." This pattern suggests ongoing data collection rather than one-time training exercises, potentially indicating that companies are using these crawls to keep their models' knowledge current.

Whats telling is that these scrapers aren't just downloading the git repos and parsing those. These aren't targeted in anyways. They're probably doing something primitive like just following every link they see and getting caught in loops. If the labyrinth solution works then that confirms it.

[-] [email protected] 1 points 2 months ago

Lol the article 403s with my VPN on.

[-] [email protected] 1 points 2 months ago

you evil AI you! /s

[-] [email protected] -4 points 2 months ago* (last edited 2 months ago)

Maybe these open source sites should move off the public internet and use alternative DNS servers with signup and alternative TLDs. Something like OpenNIC, but with signup. Or go straight to darknets like TOR and I2P. Maybe I2P would be better as it's slower and crawlers would probably timeout just trying to access content.

Anti Commercial-AI license

[-] [email protected] 4 points 2 months ago* (last edited 2 months ago)

Unless you continuously change you IP I don't see how locking DNS resolution behind a signup would solve it. You only need to resolve once, and then you know the mapping of domain to IP and can use it elsewhere. That mapping doesn't change often for hosted services.

Any wall you build up will also apply to regular users you want to reach.

[-] [email protected] 0 points 2 months ago

That's a good point. Using alternative DNS servers and alternative TLDs might be useful until they cotton on. It could even stress OpenNIC 🤔

I2P could be better.

Anti Commercial-AI license

this post was submitted on 26 Mar 2025
37 points (100.0% liked)

Opensource

2769 readers
108 users here now

A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!

CreditsIcon base by Lorc under CC BY 3.0 with modifications to add a gradient



founded 2 years ago
MODERATORS