How to poison AI data to accelerate model collapse? (programming.dev)

submitted 3 months ago by ghodawalaaman@programming.dev to c/fuck_ai@lemmy.world

14 comments fedilink hide all child comments

hello,

I have a domain and I am seeing chatgpt hitting my server very hard I am thinking about replacing my website content with gibberish so when chatgpt reads it, it will mess up their training data hopefully

all 15 comments

sorted by: hot top new old

[-] florge@feddit.uk 20 points 3 months ago

Chatgpt trains on so much data that skewing what's on your server probably won't amount to much.

[-] mech@feddit.org 19 points 3 months ago

Everyone is already poisoning the data faster than you ever could.
The web is more than 50% AI content now, which introduces errors when used as training data.

[-] SpacePanda@mander.xyz 10 points 3 months ago* (last edited 3 months ago)

I was like, no way its 50%. So I looked it up and you're spot on, not only that74% of new webpages contain ai content. fml Im going to go back and live in a cave like my ancestors before me lol

[-] hendrik@palaver.p3x.de 15 points 3 months ago

There are several tarpits, software which claims to poison LLM trainig data or genAI image models... But poisoning isn't effective. It's mainly a waste of time as models and the training process has changed and adapted. They'll curate the datasets and just get rid of the outlier information. Maybe already the crawler will make some decisions to cope. You can do it if you like. But be aware this is mostly for your own entertainment. It won't change anything.

What I do is block their address ranges and be done with it. Can be done with some access/deny rules in the webserver config. Or by the firewall.

[-] hexagonwin@lemmy.today 6 points 3 months ago

not sure how effective it would be in reality but maybe smth like this? https://zadzmo.org/code/nepenthes/

ive seen a few other similar projects but can't find them atm..

[-] spicehoarder@lemmy.zip 5 points 3 months ago

I don't know what kind of content you host on your site, and none of the people in the comments are any fun. I'd say if traffic is detected to be coming from chatgpt, serve them fake plausible info, and cite sources to legit institutions or publications, but just give fake/dead links

[-] Bluegrass_Addict@lemmy.ca 4 points 3 months ago

why not send it right back into chatgpt? is that possible? make it loop on itself so it hurts itself

[-] gmtom@lemmy.world 4 points 3 months ago

Yeah that's not actually a thing, data set pre processing is a whole specialisation in and of itself and any major developer will be a le to easily filter out junk data and "poisoned" data too.

[-] ghodawalaaman@programming.dev 1 points 3 months ago

fuck, that's what I was thinking too :(

I guess odds are against us comrades!

[-] theunknownmuncher@lemmy.world 3 points 3 months ago

The whole model collapse idea is not becoming a reality and has already been solved as an obstacle. Sorry

[-] ghodawalaaman@programming.dev 1 points 3 months ago

How dare you tell me the truth!!! /jk

[-] 30p87@feddit.org 2 points 3 months ago

iocaine

[-] Fizz@lemmy.nz 1 points 3 months ago

Its cleaned up by humans and other AI now. So I think its unlikely we'll be able to get enough poisoned data in to make a difference.

If it were possible to have your dataset poisoned people wouldn't buy the discord dataset because that is filled to the brim with the lowest quality borderline poisoned data

Better to keep your website for human expression than to ruin it trying to destroy a clanker.

[-] jadetoffee@lemmy.blahaj.zone 1 points 3 months ago

firebomb data centers lol

this post was submitted on 23 Apr 2026

84 points (94.7% liked)

Fuck AI

7738 readers

2305 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

AI, in this case, refers to LLMs, GPT technology, and anything listed as "AI" meant to increase market valuations.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world