this post was submitted on 17 Aug 2023
195 points (100.0% liked)

Technology

37708 readers
203 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 28 points 1 year ago (9 children)

NPR reported that a "top concern" is that ChatGPT could use The Times' content to become a "competitor" by "creating text that answers questions based on the original reporting and writing of the paper's staff."

That's something that can currently be done by a human and is generally considered fair use. All a language model really does is drive the cost of doing that from tens or hundreds of dollars down to pennies.

To defend its AI training models, OpenAI would likely have to claim "fair use" of all the web content the company sucked up to train tools like ChatGPT. In the potential New York Times case, that would mean proving that copying the Times' content to craft ChatGPT responses would not compete with the Times.

A fair use defense does not have to include noncompetition. That's just one factor in a fair use defense and the other factors may be enyon their own.

I think it'll come down to how "the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes" and "the amount and substantiality of the portion used in relation to the copyrighted work as a whole;" are interpreted by the courts. Do we judge if a language model by the model itself or by the output itself? Can a model itself be uninfringing and it still be able to potentially produce infringing content?

[–] [email protected] 10 points 1 year ago* (last edited 1 year ago) (2 children)

I think there's a good case that it's transformative entirely. It doesn't just spit out NYT articles. I feel like saying they "stole IP" from NYT doesn't really hunt because that would mean anyone who read the NYT and then wrote any kind of article at some point also engaged in IP theft because almost certainly their consumption of the NYT influenced their writing in some way. ( I think the same thing holds up to a weaker degree with generative image AI just seems a bit different sometimes directly copying the actual brushstrokes etc of real artists there's also only so many ways to arrange words)

It is however an entirely new thing, so it's up to judges for now to rule how that works.

[–] [email protected] 7 points 1 year ago

I have it on good authority that the writers of the NYT have also read other news papers before. This blatant IP theft goes deeper than we could have ever imagined.

load more comments (1 replies)
load more comments (7 replies)