this post was submitted on 30 Mar 2025
130 points (97.8% liked)

Selfhosted

45388 readers
616 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

Lemmy newb here, not sure if this is right for this /c.

An article I found from someone who hosts their own website and micro-social network, and their experience with web-scraping robots who refuse to respect robots.txt, and how they deal with them.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 10 points 3 days ago (1 children)

I found a very large botnet in Brazil mainly and several other countries. And abuseipdb.com is not marking those IPs are a thread. We need a better solution.

I think a honeypot is a good way. Another way is to use proof of work basically on the client side. Or we need a better place to share all stupid web scraping bot IPs.

[–] [email protected] 5 points 3 days ago (1 children)

I love the idea of abuseipdb and I even contributed to it briefly. Unfortunately, even as a contributor, I don't get enough API resources to actually use it for my own purposes without having to pay. I think the problem is simply that if you created a good enough database of abusive IPs then you'd be overwhelmed in traffic trying to pull that data out.

[–] [email protected] 7 points 3 days ago

Not really.. We do have this wonderful list(s): https://github.com/firehol/blocklist-ipsets

And my firewall is using for example the Spamhaus drop list source: https://raw.githubusercontent.com/firehol/blocklist-ipsets/refs/heads/master/spamhaus_drop.netset

So I know its possible. And hosting in a git repo like that, will scale a lot. Since tons of people using this already that way.