this post was submitted on 27 Jan 2025
652 points (97.7% liked)

Technology

35483 readers
381 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.


Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.


Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 19 points 1 day ago (4 children)

So if the Chinese version is so efficient, and is open source, then couldn't openAI and anthropic run the same on their huge hardware and get enormous capacity out of it?

[–] [email protected] 9 points 1 day ago (1 children)

OpenAI could use less hardware to get similar performance if they used the Chinese version, but they already have enough hardware to run their model.

Theoretically the best move for them would be to train their own, larger model using the same technique (as to still fully utilize their hardware) but this is easier said than done.

[–] [email protected] 2 points 1 day ago (1 children)

Just ask the ai to assimilate the model?

[–] [email protected] 10 points 1 day ago (2 children)

Not necessarily... if I gave you my "faster car" for you to run on your private 7 lane highway, you can definitely squeeze every last bit of the speed the car gives, but no more.

DeepSeek works as intended on 1% of the hardware the others allegedly "require" (allegedly, remember this is all a super hype bubble)... if you run it on super powerful machines, it will perform nicer but only to a certain extend... it will not suddenly develop more/better qualities just because the hardware it runs on is better

[–] [email protected] 2 points 1 day ago

Didn't deepseek solve some of the data wall problems by creating good chain of thought data with an intermediate RL model. That approach should work with the tried and tested scaling laws just using much more compute.

[–] [email protected] 4 points 1 day ago (1 children)

This makes sense, but it would still allow a hundred times more people to use the model without running into limits, no?

[–] [email protected] 3 points 1 day ago

hence certain tech grifters going "oh shitt...."

[–] [email protected] 8 points 1 day ago

Yes but have you considered that "china bad"?

[–] [email protected] 3 points 1 day ago (1 children)

It's not multimodal so I'd have to imagine it wouldn't be worth pursuing in that regard.

[–] [email protected] 1 points 1 day ago

doesn't deepseek work on that though with their janus models?