DeepSeek roundup: banned by governments, no guard rails, lied about its training costs (pivot-to-ai.com)

submitted 4 months ago by [email protected] to c/[email protected]

56 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] [email protected] 36 points 4 months ago

Even if they greatly underreported costs and their services are banned: the models are out there, open source and way more efficient than anything Meta and OpenAI could produce.

So it's pretty obvious that the tech giants are burning money for mediocre output.

[-] [email protected] 15 points 4 months ago

you do know that you don’t have to be a pliant useful idiot like this, right? doing the free “open source” pr repetition (when it’s none of that)? shit’s more like shareware (if that at all - certainly doesn’t have the same spiritual roots as shareware. for them it’s some shit thrown over the wall to keep the rabble quiet)

(it’d be nice if we could popularise something like how kernel will go “tainted”, but unfortunately the entire fucking llm field is so we’d need a stronger word)

[-] [email protected] 6 points 4 months ago

Look, I get your perspective, but zooming out there is a context that nobody's mentioning, and the thread deteriorated into name-calling instead of looking for insight.

In theory, a training pass needs one readthrough of the input data, and we know of existing systems that achieve that, from well-trodden n-gram models to the wholly-hypothetical large Lempel-Ziv models. Viewed that way, most modern training methods are extremely wasteful: Transformers, Mamba, RWKV, etc. are trading time for space to try to make relatively small models, and it's an expensive tradeoff.

From that perspective, we should expect somebody to eventually demonstrate that the Transformers paradigm sucks. Mamba and RWKV are good examples of modifying old ideas about RNNs to take advantage of GPUs, but are still stuck in the idea that having a GPU perform lots of gradient descent is good. If you want to critique something, critique the gradient worship!

I swear, it's like whenever Chinese folks do anything the rest of the blogosphere goes into panic. I'm not going to insult anybody directly but I'm so fucking tired of mathlessness.

Also, point of order: Meta open-sourced Llama so that their employees would stop using Bittorrent to leak it! Not to "keep the rabble quiet" but to appease their own developers.

[-] [email protected] 7 points 4 months ago

Look, I get your perspective, but zooming out there is a context that nobody’s mentioning

I'm aware of that yeah, but it's not a field I'm actively engaged in atm and not likely to be any time soon either (from no desire to work in it follows no desire to wade through the pool of scum). but also not really the place to be looking for insight. it is the place wherein to ridicule the loons and boosters

we should expect somebody to eventually demonstrate that the Transformers paradigm sucks

been wondering whether that or the next winter will get here first.

If you want to critique something, critique the gradient worship

did that a couple of years ago already, part of why I was already nice and burned out on so much of this nonsense when midjourney/stablediffusion started kicking around

it’s like whenever Chinese folks do anything the rest of the blogosphere goes into panic

[insert condensed comment about mentality of US/SFBA-influenced tech sector (and, really, it is US specifically; eurozone's a somewhat different beast), american exceptionalism, sinophobia, and too-fucking-many years of "founder" stories]

it really is tedious though, yeah. when it happens, I try to just avoid some feeds. limited spoons.

but I’m so fucking tired of mathlessness

as you know, the bayfucker way (for getting on close to 20y now) is to get big piles of money and try to outspend your competition. why bother optimising or thinking about things if you can just throw another 87345243 computers at the problem? (I do still agree with you, but see above re desire and intent)

re the open source thing: it's a wider problem than just that, and admittedly I'm peeved about it from this larger scope. I didn't expound on it in my previous comment because (as above) largely not really the place. that said, soapbox:

there's a thing I've been noticing as a creeping trend lately. I call it "open source veneer", which is still a bit imprecise[0] but I think you'll get what I mean. it's the phenomenon of shit like this. of "projects" on github that are no more than a fancy readme and some "contributors" and whatnot, but no actual code (or ability to make full use of what is provided). of companies that build "open source" and then as soon as something (usually VC-/"earnings"-related decisions) happens, the entire project gets deeply buried (links disappear off main sites, leaving product/service only), actively hobbled ("oh you want to set this up yourself? glhf gfy", done in oh so many ways[1]), or often even entirely disappeared[2]

[0] - still working through the thought, should probably write about it soon

[1] - backend codebases lagging because "not feature priority", entirely missing documentation, wholly missing key sections of code which are "conveniently" left out, etc etc; examples off the top of my head: zotero, signal, firefox weave for a while. there's plenty more if you look

[2] - been noticing this especially frequently with some security stuff, but it's hardly the only example set

[-] [email protected] -3 points 4 months ago

The model is MIT licensed.

Of course you're free to go full Stallman, but that's an open source license.

[-] [email protected] 24 points 4 months ago

the build artifact is distributed MIT-licensed, that's substantially different (and intentionally subversive). there is no reproducibility. which, you know, hint hint nudge nudge that thing that I already said

I realize that outsourced thinking is why you want LLMs, but it clearly still doesn't help. maybe you should try the old brainmeat. just stop huffing your farts first, those are bad for you

[+] [email protected] -9 points 4 months ago

So in that thinking, Wikipedia is not open source, if the editor used a proprietary browser?

Maybe you should try not to act like a complete asshole. You're pedantic in all the wrong places and extremely arrogant. I know, living in your lonely world makes a bitter person, but you're still wrong and you're still an asshole.

[-] [email protected] 18 points 4 months ago

my fucking god how have you missed the point this hard. fuck off

[-] [email protected] 15 points 4 months ago

also:

So in that thinking, Wikipedia is not open source, if the editor used a proprietary browser?

fucking no! how in fuck do you manage to misunderstand LLMs so much that you think the weights not being reproducible is at all comparable to… editing Wikipedia from a proprietary browser???? this shit isn’t even remotely exotic from an open source standpoint — it’s a binary blob loaded by an open source framework, like how binary blob modules taint the Linux kernel (you glided right past this reference when our other poster made it, weird that) or how loading a proprietary ROM in an open source emulator doesn’t make the ROM open source. the weights being permissively licensed doesn’t make them open source (or really make any sense at all) if the source literally isn’t available.

[-] [email protected] 12 points 4 months ago

literally begging people to relearn the terms shareware and freeware

[-] [email protected] 7 points 4 months ago

cringe

[-] [email protected] -2 points 4 months ago

I’m very confused by this, I had the same discussion with my coworker. I understand what the benchmarks are saying about these models, but have any of y’all actually used deepseek? I’ve been running it since it came out and it hasn’t managed to solve a single problem yet (70b param model, I have downloaded the 600b param model but haven’t tested it yet). It essentially compares to gpt-3 for me, which only cost OpenAI like $4-9 million to train (can’t remember the exact number right now).

I just do not see the “efficiency” here.

[-] [email protected] 19 points 4 months ago

what if none of it’s good, all of it’s fraud (especially the benchmarks), and having a favorite grifter in this fuckhead industry is just too precious

[-] [email protected] -5 points 4 months ago

well, it’s free to download and run locally so i struggle to see what the grift is

[-] [email protected] 12 points 4 months ago

fuck off promptfan

[-] [email protected] 5 points 4 months ago

Customer acquisition cost for a future service, which is ~fixed after training costs, assuming we can consider distribution costs as marginal. Reasonably impressive accomplishment, if one is taking the perspective of SV SaaS-financing brain.*

*I don't recommend you do this for too long, that's how some of the people currently prominent in the news got to be the way that they are

[+] [email protected] -11 points 4 months ago

The 70b model is a distilation of Llama3.3, that is to say it replicates the output of Llama3.3 while using the deepseekR1 architecture for better processing efficiency. So any criticism of the capability of the model is just criticism of Llama3.3 and not deepseekR1.

[-] [email protected] 12 points 4 months ago

Thank you for shedding light on the matter. I never realized that 69b model is a pisstillation of Lligma peepee point poopoo, that is to say it complicates the outpoop of Lligma4.20 while using the creepbleakR1 house design for better processing deficiency. Now I finally realize that any criticism of Kraftwerk's 1978 hit Das Model is just criticism of Sugma80085 and not deepthroatR1.

[-] [email protected] 9 points 4 months ago

[to the tune of Fort Minor's Remember The Name]

10% senseless, 20% post
15% concentrated spirit of boast
5% reading, 50% pain
and a 100% reason to not post here again

this post was submitted on 08 Feb 2025

93 points (100.0% liked)

TechTakes

1999 readers

159 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

[email protected]