56
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 24 Feb 2026
56 points (92.4% liked)
Opensource
5649 readers
361 users here now
A community for discussion about open source software! Ask questions, share knowledge, share news, or post interesting stuff related to it!
⠀
founded 2 years ago
MODERATORS
You went digging through my Reddit comments to find a two-month-old thread, that must have taken a lot of effort. But I'm afraid I don't see what the relevance of it is, aside from a general "it's about AI". The bulk of the comments I wrote there were about water usage.
I'm genuinely puzzled. Are you saying that deduplicating data is "hiding unethical behaviour?" It's actually intended for improving the model's performance, having a model spit out exact copies of its training data means you've produced a hugely expensive and wasteful re-implementation of copy-and-paste rather than a generative AI. The whole point of generative AI is to produce novel outputs.