this post was submitted on 20 Oct 2023
1346 points (100.0% liked)

196

16509 readers
2324 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 1 year ago* (last edited 1 year ago)

Heads up, this is a long fucking comment. I don't care if you love or hate AI art, what it represents, or how it's trained. I'm here to inform, refine your understanding of the tools (and how exactly that might fit in the current legal landscape), and nothing more. I make no judgements about whether you should or shouldn't like AI art or generative AI in general. You may disagree about some of the legal standpoints too, but please be aware of how the tools actually work because grossly oversimplifying them creates serious confusion and frustration when discussing it.

Just know that, because these tools are open source and publically available to use offline, Pandora's box has been opened.

copying concepts is also copyright infringement

Except it really isn't in many cases, and even in the cases where it could be, there can be rather important exceptions. How this all applies to AI tools/companies themselves is honestly still up for debate.

Copyright protects actual works (aka "specific expression"), not mere ideas.

The concept of a descending blocks puzzle game isn't copyrighted, but the very specific mechanics of Tetris are copyrighted. The concept of a cartoon mouse isn't copyrighted, but mickey mouse's visual design is. The concept of a brown haired girl with wolf ears/tail and red eyes is not copyrighted, but the exact depiction of Holo from Spice and Wolf is (though that's more complicated due to weaker trademark and stronger copyright laws in Japan). A particular chord progression is not copyrightable (or at least it shouldn't be) but a song or performance created with it is.

A mere concept is not copyrightable. Once the concept is specific enough and you have copyrighted visual depictions of it, then you start to run more into trademark law territory and start to gain a copyright case. I really feel like these cases are kinda exceptions though, at least for the core models like stable diffusion itself, because there's just so much existing art (both official and even moreso copyright/trademark infringing fan art) of characters like Mickey Mouse anyways.

The thing the AI does is distill concepts and interactions between concepts shared between many input images, and can do so in a generalized way that allows concepts never before seen together to be mixed together easily. You aren't getting transformations of specific images out of the AI, or even small pieces of each trained image, you're instead getting transformations of learned concepts shared across many many many works. This is why the shredding analogy just doesn't work. The AI generally doesn't, and is not designed to, mimic individual training images. A single image changes the weights of the AI by such a miniscule amount, and those exact same weights are also changed by many other images the AI trains on. Generative AI is very distinctly different from tracing, or distributing mass information that's precisely specific enough to pirate content, or from transforming copyrighted works to make them less detectable.

To drive the point home, I'd like to expand on how the AI and its training is actually implemented, because I think that might clear some things up for anyone reading. I feel like the actual way in which the AI training uses images matters.

A diffusion model, which is what current AI art uses, is a giant neural network that we want to guess the noise pattern of an image. To train it on an image, we add some random amount of noise to the whole image (could be a small amount like film grain, or it could be enough to make the image completely noise, but it's random each time), then pass that image and its caption through the AI to get the noise pattern the AI guesses is in the image. Now we take the difference between the noise pattern it guessed and the noise pattern we actually added to the training image to calculate the error. Finally, we tweak the AI weights based on that error. Of note, we don't tweak the AI to perfectly guess the noise pattern or reduce the error to zero, we barely tweak the AI to guess ever so slightly better (like, 0.001% better). Because the AI is never supposed to see the same image many times, it has to learn to interpret the captions (and thus concepts) provided alongside each image to direct its noise guesses. The AI still ends up being really bad at guessing high noise or completely random noise anyways, which is yet another reason why it can't generally reproduce existing trained images from nothing.

Now let's talk about generation (aka "inference"). So we have an AI that's decent at guessing noise patterns in existing images as long as we provide captions. This works even for images that it didn't train on. That's great for denoising and upscaling existing images, but how do we get it to generate new unique images? By asking it to denoise random noise and giving it a caption! It's still really shitty at this though, the image just looks like some blobby splotches of color with no form, else it probably wouldn't work at denoising existing images anyways. We have a hack though: add some random noise back into the generated image and send it through the AI again. Every time we do this, the image gets sharper and more refined, and looks more and more like the caption we provided. After doing this 10-20 times we end up with a completely original image that isn't identifiable in the training set but looks conceptually similar to existing images that share similar concepts. The AI has learned not to copy images while training, but actually learned visual concepts. Concepts which are generally not copyrighted. Some very specific depictions which it learns are technically copyrighted, i.e. Mickey Mouse's character design, but the problem with that claim too is that there are fair use exceptions, legitimate use cases, which can often cover someone who uses the AI in this capacity (parody, educational, not for profit, etc). Whether providing a tool that can just straight up allow anyone to create infringing depictions of common characters or designs is legal is up for debate, but when you use generative AI it's up to you to know the legality of publishing the content you create with it, just like with hand made art. And besides, if you ask an AI model or another artist to draw Mickey mouse for you, you know what you're asking for, it's not a surprise, and many artists would be happy to oblige so long as their work doesn't get construed as official Disney company art. (I guess that's sorta a point of contention about this whole topic though isn't it? If artists could get takedowns on their mickey mouse art, why wouldn't an AI model get takedowns too for trivially being able to create it?)

Anyways, if you want this sort of training or model release to be a copyright violation, as many do, I'm unconvinced current copyright/IP laws could handle it gracefully, because even if the precise method by which AI's and humans learn and execute is different, the end result is basically the same. We have to draw new more specific lines on what is and isn't allowed, decide how AI tools should be regulated while taking care not to harm real artists, and few will agree on where the lines should be drawn.

Also though, Stable Diffusion and its many many descendents are already released publicly and open source (same with Llama for text generation), and it's been disseminated to so many people that you can no longer stop it from existing. That fact doesn't give StabilityAI a pass, nor do other AI companies who keep their models private get a pass, but it's still worth remembering that Pandora's box has already been opened.