Technology

59299 readers

4599 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each another!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed

Approved Bots

founded 1 year ago

MODERATORS

[email protected]

155

OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity (arstechnica.com)

submitted 1 year ago by [email protected] to c/[email protected]

30 comments fedilink hide all child comments

On Wednesday, OpenAI announced DALL-E 3, the latest version of its AI image synthesis model that features full integration with ChatGPT. DALL-E 3 renders images by closely following complex descriptions and handling in-image text generation (such as labels and signs), which challenged earlier models. Currently in research preview, it will be available to ChatGPT Plus and Enterprise customers in early October.

Like its predecessor, DALLE-3 is a text-to-image generator that creates novel images based on written descriptions called prompts. Although OpenAI released no technical details about DALL-E 3, the AI model at the heart of previous versions of DALL-E was trained on millions of images created by human artists and photographers, some of them licensed from stock websites like Shutterstock. It's likely DALL-E 3 follows this same formula, but with new training techniques and more computational training time.

Judging by the samples provided by OpenAI on its promotional blog, DALL-E 3 appears to be a radically more capable image synthesis model than anything else available in terms of following prompts. While OpenAI's examples have been cherry-picked for their effectiveness, they appear to follow the prompt instructions faithfully and convincingly render objects with minimal deformations. Compared to DALL-E 2, OpenAI says that DALL-E 3 refines small details like hands more effectively, creating engaging images by default with "no hacks or prompt engineering required."

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago) (4 children)

I will be convinced when they learn to draw hands correctly, which they seem to boast about here.

[–] [email protected] 16 points 1 year ago (2 children)

Here's an example image from the article.

https://cdn.arstechnica.net/wp-content/uploads/2023/09/plategirl-980x560.jpg

[–] [email protected] 29 points 1 year ago (1 children)

from the article

Well no wonder they couldn't find this example.

[–] [email protected] 8 points 1 year ago

For a system where the intent is to read, learn, or be entertained (and kill time), people seem unwilling to do the first to accomplish the latter.

[–] [email protected] 20 points 1 year ago* (last edited 1 year ago) (2 children)

Was the prompt “Woman from China”?

Edit: I feel like the nuance of this joke may have been lost on some. Whether or not I read the article is irrelevant, since this was not a genuine question, rather a play on words of the double meaning of “china” as in “A woman from (the country) China” and “A woman (emerging) from china (porcelain)”.

I’ll get my coat.

[–] [email protected] 12 points 1 year ago (1 children)

The prompt is on the picture in the article:

A DALL-E 3 image provided by OpenAI with the prompt: "A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form."

Why do we need AI creating text, when nobody is reading?

[–] [email protected] 1 points 1 year ago

Whoosh

[–] [email protected] 1 points 1 year ago (1 children)

You might want to put it all lowercase next time

[–] [email protected] 2 points 1 year ago

The next time I make the same joke?

I reckon I’ll just keep it to myself instead. I already feel ridiculous for having to explain it. Lemmy is harder than real life.

[–] [email protected] 4 points 1 year ago

Making the context window likely helps with stuff, however it still has the issue of "background breaking".

[–] [email protected] 2 points 1 year ago* (last edited 1 year ago) (1 children)

Seems to be about 50/50, quite a few good looking hands, but still plenty of crocked fingers with some prompts. I think they might need training on video or 3D models, the structure of hands is probably difficult to figure out just from 2D images.

[–] [email protected] 1 points 1 year ago

Yup that's the thing with most of generative AI models, they have no implicit 3D modelling of the world. So depending on perspective, a real 2D image may give the impression that there are 2 or 3 fingers only but the model doesn't know that that's just because of perspective.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

The reason AI struggles with hands is because real artists struggle with them too.

[–] [email protected] 15 points 1 year ago* (last edited 1 year ago)

While there is some truth in this, humans and AI do not make the same type of mistakes with hands.

Humans will rebuild the topological structure of the hand: 5 fingers protruding from a base, and get the proportions wrong..while the topology is credible.

AI will rebuild the image of a hand from the 2d appearance of a hand: a variable number of flesh colored, parallel stripes, and improvise from that.

While both can get it wrong, the errors are not similar.