this post was submitted on 11 Feb 2025
523 points (98.7% liked)

Technology

62161 readers
3738 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 12 points 2 days ago* (last edited 2 days ago) (2 children)

They are, however, able to inaccurately summarize it in GLaDOS's voice, which is a strong point in their favor.

[–] [email protected] 2 points 1 day ago* (last edited 1 day ago)

Yeah, out of all the generative AI fields, voice generation at this point is like 95% there in its capability of producing convincing speech even with consumer level tech like ElevenLabs. That last 5% might not even be solvable currently, as it's those moments it gets the feeling, intonation or pronunciation wrong when the only context you give it is a text input, which is why everything purely automated tends to fall apart quite fast.

Especially voice cloning - the DRG Cortana Mission Control mod is one of the examples I like to use.

[–] [email protected] 3 points 2 days ago (1 children)

Surely you'd need TTS for that one, too? Which one do you use, is it open weights?

[–] [email protected] 1 points 2 days ago* (last edited 2 days ago) (1 children)

Zonos just came out, seems sick:

https://huggingface.co/Zyphra

There are also some “native” tts LLMs like GLM 9B, which “capture” more information in the output than pure text input.

[–] [email protected] 2 points 2 days ago* (last edited 2 days ago) (1 children)

A website with zero information, and barely anything on their huggingface page. What’s exciting about this?

Ahh, you should link to the model

https://www.zyphra.com/post/beta-release-of-zonos-v0-1

[–] [email protected] 1 points 2 days ago (1 children)

Whoops, yeah, should have linked the blog.

I didn't want to link the individual models because I'm not sure hybrid or pure transformers is better?

[–] [email protected] 1 points 1 day ago

Looks pretty interesting, thanks for sharing it