this post was submitted on 21 Mar 2025
168 points (98.3% liked)

Selfhosted

44780 readers
1052 users here now

A place to share alternatives to popular online services that can be self-hosted without giving up privacy or locking you into a service you don't control.

Rules:

  1. Be civil: we're here to support and learn from one another. Insults won't be tolerated. Flame wars are frowned upon.

  2. No spam posting.

  3. Posts have to be centered around self-hosting. There are other communities for discussing hardware or home computing. If it's not obvious why your post topic revolves around selfhosting, please include details to make it clear.

  4. Don't duplicate the full text of your blog or github here. Just post the link for folks to click.

  5. Submission headline should match the article title (don’t cherry-pick information from the title to fit your agenda).

  6. No trolling.

Resources:

Any issues on the community? Report it using the report flag.

Questions? DM the mods!

founded 2 years ago
MODERATORS
 

I just started using this myself, seems pretty great so far!

Clearly doesn't stop all AI crawlers, but a significantly large chunk of them.

top 50 comments
sorted by: hot top controversial new old
[–] [email protected] 13 points 10 hours ago (1 children)

Why Sha256? Literally every processor has a crypto accelerator and will easily pass. And datacenter servers have beefy server CPUs. This is only effective against no-JS scrapers.

[–] [email protected] 4 points 2 hours ago* (last edited 2 hours ago)

It requires a bunch of browser features that non-user browsers don't have, and the proof-of-work part is like the least relevant piece in this that only gets invoked once a week or so to generate a unique cookie.

I sometimes have the feeling that as soon as some crypto-currency related features are mentioned people shut off part of their brain. Either because they hate crypto-currencies or because crypto-currency scammers have trained them to only look at some technical implementation details and fail to see the larger picture that they are being scammed.

[–] [email protected] 8 points 12 hours ago (1 children)

Found the FF14 fan lol
The release names are hilarious

[–] [email protected] 1 points 12 minutes ago

What's the ffxiv reference here?

Anubis is from Egyptian mythology.

[–] [email protected] 13 points 14 hours ago (1 children)

I think the maze approach is better, this seems like it hurts valid users if the web more than a company would be.

[–] [email protected] 14 points 8 hours ago* (last edited 59 minutes ago) (1 children)

For those not aware, nepenthes is an example for the above mentioned approach !

[–] [email protected] 2 points 1 hour ago

This looks like it can can actually fuck up some models, but the unnecessary CPU load it will generate means most websites won't use it unfortunately

[–] [email protected] 15 points 16 hours ago* (last edited 5 hours ago) (3 children)

I did not find any instruction on the source page on how to actually deploy this. That would be a nice touch imho.

[–] [email protected] 1 points 1 hour ago

The docker image page has it

[–] [email protected] 2 points 7 hours ago

Or even a quick link to the relevant portion of the docs at least would be cool

[–] [email protected] 9 points 15 hours ago

There are some detailed instructions on the docs site, tho I agree it'd be nice to have in the readme, too.

Sounds like the dev was not expecting this much interest for the project out of nowhere so there will def be gaps.

[–] [email protected] 38 points 19 hours ago* (last edited 19 hours ago) (2 children)

It's a clever solution but I did see one recently that IMO was more elegant for noscript users. I can't remember the name but it would create a dummy link that human users won't touch, but webcrawlers will naturally navigate into, but then generates an infinitely deep tree of super basic HTML to force bots into endlessly trawling a cheap-to-serve portion of your webserver instead of something heavier. Might have even integrated with fail2ban to pick out obvious bots and keep them off your network for good.

[–] [email protected] 5 points 10 hours ago (1 children)

generates an infinitely deep tree

Wouldn't the bot simply limit the depth of it's seek?

[–] [email protected] 2 points 5 hours ago

It could be infinitely wide too if they desired. It shouldn't be that hard to do I wouldn't think. I would suspect they limit the time a chain can use though to eventually escape out, though this still protects data because it obfuscates legitimate data that it wants. The goal isn't to trap them forever. It's to keep them from getting anything useful.

[–] [email protected] 12 points 19 hours ago (3 children)

If you remember the project I would be interested to see it!

But I've seen some AI poisoning sink holes before too, a novel concept as well. I have not heard of real world experiences of them yet.

[–] [email protected] 19 points 16 hours ago

I'm assuming they're thinking about this

A pseudonymous coder has created and released an open source “tar pit” to indefinitely trap AI training web crawlers in an infinitely, randomly-generating series of pages to waste their time and computing power. The program, called Nepenthes after the genus of carnivorous pitcher plants which trap and consume their prey, can be deployed by webpage owners to protect their own content from being scraped or can be deployed “offensively” as a honeypot trap to waste AI companies’ resources.

Which was posted here a while back

[–] [email protected] 19 points 21 hours ago (2 children)

Meaning it wastes time and power such that it gets expensive on a large scale? Or does it mine crypto?

[–] [email protected] 25 points 21 hours ago* (last edited 21 hours ago) (12 children)

Yes, Anubis uses proof of work, like some cryptocurrencies do as well, to slow down/mitigate mass scale crawling by making them do expensive computation.

https://lemmy.world/post/27101209 has a great article attached to it about this.

--

Edit: Just to be clear, this doesn't mine any cryptos, just uses same idea for slowing down the requests.

load more comments (12 replies)
load more comments (1 replies)
[–] [email protected] 12 points 20 hours ago (2 children)

It's a rather brilliant idea really, but when you consider the environmental implications of forcing web requests to ensure proof of work to function, this effectively burns a more coal for every site that implements it.

[–] [email protected] 22 points 18 hours ago

You have a point here.

But when you consider the current worlds web traffic, this isn't actually the case today. For example Gnome project who was forced to start using this on their gitlab, 97% of their traffic could not complete this PoW calculation.

IE - they require only a fraction of computational cost to serve their gitlab, which saves a lot of resources, coal, and most importantly, time of hundreds of real humans.

(Source for numbers)

Hopefully in the future we can move back to proper netiquette and just plain old robots.txt file!

[–] [email protected] 7 points 16 hours ago

I don't think AI companies care, and I wholeheartedly support any and all FOSS projects using PoW when serving their websites. I'd rather have that than have them go down

[–] [email protected] 5 points 20 hours ago

Upvote for the name and tag line alone!

[–] [email protected] 2 points 18 hours ago (1 children)

Anubis is provided to the public for free in order to help advance the common good. In return, we ask (but not demand, these are words on the internet, not word of law) that you not remove the Anubis character from your deployment.
If you want to run an unbranded or white-label version of Anubis, please contact Xe to arrange a contract.

This is icky to me. Cool idea, but this is weird.

[–] [email protected] 15 points 18 hours ago (2 children)

...Why? It's just telling companies they can get support + white-labeling for a fee, and asking you keep their silly little character in a tongue-and-cheek manner.
Just like they say, you can modify the code and remove for free if you really want, they're not forbidding you from doing so or anything

[–] [email protected] 6 points 12 hours ago

Just like they say, you can modify the code and remove for free if you really want, they’re not forbidding you from doing so or anything

True, but I think you are discounting the risk that the actual god Anubis will take displeasure at such an act, potentially dooming one's real life soul.

[–] [email protected] 3 points 17 hours ago (1 children)

Yeah, it seems entirely optional. It's not like manually removing the Anubis character will revoke your access to the code. However, I still do find it a bit weird that they're asking for that.

I just can't imagine most companies implementing Anubis and keeping the character or paying for the service, given that it's open source. It's just unprofessional for the first impression of a company's website being the Anubis devs' manga OC...

[–] [email protected] 7 points 17 hours ago* (last edited 17 hours ago) (1 children)

It is very different from the usual flat corporate style yes, but this is just their branding. Their blog is full of anime characters like that.

And it's not like you're looking at a literal ad for their company or with their name on it. In that sense it is subtle, though a bit unusual.

[–] [email protected] 3 points 17 hours ago

I don't think it's necessarily a bad thing. Subtle but unusual is a good way to describe it.

However, I would like to point out that if it is their branding, then the character appearing is an advertisement for the service. It's just not very conventional or effective advertising, but they're not making money from a vast majority of implementations, so it's not very egregious anyway.

load more comments
view more: next ›