Technology

61300 readers

2523 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related content.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, to ask if your bot can be added please contact us.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

[email protected]

240

Proton's very biased article on Deepseek (lemmy.ml)

submitted 1 day ago by [email protected] to c/[email protected]

108 comments fedilink hide all child comments

Article: https://proton.me/blog/deepseek

Calls it "Deepsneak", failing to make it clear that the reason people love Deepseek is that you can download and it run it securely on any of your own private devices or servers - unlike most of the competing SOTA AIs.

I can't speak for Proton, but the last couple weeks are showing some very clear biases coming out.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 1 day ago (1 children)

You should try the comparison between the larger models and the distilled models yourself before you make judgment. I suspect you're going to be surprised by the output.

All of the models are basically generating possible outcomes based on noise. So if you ask it the same model the same question five different times and five different sessions you're going to get five different variations on an answer.

You will find that an x out of five score between models is not that significantly different.

For certain cases larger models are advantageous. If you need a model to return a substantial amount of content to you. If you're asking it to write you a chapter story. Larger models will definitely give you better output and better variation.

But if you're asking you to help you with a piece of code or explain some historical event to you, The average 14B model that will fit on any computer with a video card will give you a perfectly serviceable answer.

[–] [email protected] 1 points 20 hours ago

I have tried them, and to be honest I was not surprised. The hosted service was better at longer code snippets and in particular, I found that it was consistently better at producing valid chain of thought reasoning chains (I've found that a lot of simpler models, including the distills, tend to produce shallow reasoning chains, even when they get the answer to a question right).

I'm aware of how these models work; I work in this field and have been developing a benchmark for reasoning capabilities in LLMs. The distills are certainly still technically impressive and it's nice that they exist, but the gap between them and the hosted version is unfortunately nontrivial.