this post was submitted on 28 Aug 2023
71 points (91.8% liked)
Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ
54462 readers
270 users here now
⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.
Rules • Full Version
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
Loot, Pillage, & Plunder
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Books3 corpus would like you to know that all the data in it is from copyrighted books. It has reportedly been widely used in closed-source AI LLMs. "Rules for thee, not for me" shit. They'll break copyright and then copyright what they made from it.
https://huggingface.co/datasets/the_pile_books3
Books3 is literally everything from the Bibliotik private tracker for books.
So yeah, fuckin roll out the cannons, mateys, let's sink these hypocritical fuckers.
This has the same vibe as Github (owned by microsoft) training its AI Copilot on repositories under the GPL license, which specifically forbids any work based on it not be made proprietary. Literally a blatant disregard for the license, but it's ok because it's a mega-corporation doing it
You're allowed to train on copyrighted works, it isn't illegal for anybody. This article by Kit Walsh does a good job of breaking it down. She's a senior staff attorney at the EFF.
I didn't say it was illegal, I said it was hypocritical.
Oh, my bad.