this post was submitted on 04 Sep 2024
950 points (98.6% liked)

Technology

59038 readers
3747 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 178 points 1 month ago (10 children)

If OpenAI can get away with going through copy-righted material, then the answer to piracy is simple: round up a bunch of talented Devs from the internet who are writing and training AI models, and let's make a fantastic model trained on what the internet archive has. Tell you what, let Mistral's engineers lead that charge, and put an AGPL license on the project so that companies can't fuck us over.

I refuse to believe that nobody has thought of this yet

[–] [email protected] 2 points 1 month ago (4 children)

We get it, y’all hate LLMs and the companies who make them.

This comparison is disingenuous and I have to think you’re smart enough to know that, making this disinformation.

If/when an LLM like ChatGPT spits out a full copy of training text, that’s considered a bug and is remediated fairly quickly. It’s not a feature.

What IA was doing was sharing the full text as a feature.

As far as I know, there are some court cases pending regarding determining if companies like Open AI are guilty of copyright infringement but I haven’t seen any convictions yet (happy to be corrected here).

All that said, I love IA and have a Warrior container scheduled to run nightly to help contribute.

[–] [email protected] 4 points 1 month ago (1 children)

Hmm, true. IA wouldn't be as supported if we couldn't get the full text of the source.

Can you tell me more about the "warrior container"?

[–] [email protected] 4 points 1 month ago

It’s mentioned in the OP but it’s this:

https://wiki.archiveteam.org/index.php/ArchiveTeam_Warrior

Basically, distributed collection.

load more comments (2 replies)
load more comments (7 replies)