this post was submitted on 16 Sep 2024

372 points (98.2% liked)

Technology

34828 readers

19 users here now

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

founded 5 years ago

MODERATORS

[email protected]

372

"participants who had access to an AI assistant wrote significantly less secure code" and "were also more likely to believe they wrote secure code" - 2023 Stanford University study published at CCS23 (arxiv.org)

submitted 1 month ago by [email protected] to c/[email protected]

38 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] [email protected] -29 points 1 month ago* (last edited 1 month ago) (22 children)

2023? Like last year? Like when LLMs were just a curiosity more than anything useful?

They should be doing these studies continuously...

Edit: Oh no, I forgot Lemmy hates LLMs. Oh well, can't blame you guys, hate is the basic manifestation towards what scares you, and it's revealing.

[–] [email protected] 18 points 1 month ago (10 children)

Unlike this year when LLMs are more of a huge scam.

[–] [email protected] -2 points 1 month ago (2 children)

Curious why your perspective is they're are more of a scam when by all metrics they've only improved in accuracy?

[–] [email protected] 3 points 1 month ago (1 children)

Source?

[–] [email protected] -2 points 1 month ago (1 children)

Olympic Arena analysis OpenAI analyses

Compare the GPT increase from their V2 GPT4o model to their reasoning o1 preview model. The jumps from last years GPT 3.5 -> GPT 4 were also quite large. Secondly if you want to take OpenAI's own research into account that's in the second image.

[–] [email protected] 6 points 1 month ago (1 children)

if you want to take OpenAI’s own research into account

No thank you.

OlympicArena validation set (text-only)

"Our extensive evaluations reveal that even advanced models like GPT-4o only achieve a 39.97% overall accuracy (28.67% for mathematics and 29.71% for physics)"

The OlympicArena analysis that you cited.

[–] [email protected] -2 points 1 month ago (1 children)

The jump from GPT-4o -> o1 (preview not full release) was a 20% cumulative knowledge jump. If that's not an improvement in accuracy I'm not sure what is.

[–] [email protected] 3 points 1 month ago* (last edited 1 month ago) (1 children)

One of the first things they teach you in Experimental Physics is that you can't derive a curve from just 2 data points.

You can just as easilly fit an exponential growth curve to 2 points like that one 20% above the other, as you can a a sinusoidal curve, a linear one, an inverse square curve (that actually grows to a peak and then eventually goes down again) and any of the many curves were growth has ever diminishing returns and can't go beyond a certain point (literally "with a limit")

I think the point that many are making is that LLM growth in precision is the latter kind of curve: growing but ever slower and tending to a limit which is much less than 100%. It might even be like more like the inverse square one (in that it might actually go down) if the output of LLM models ends up poluting the training sets of the models, which is a real risk.

You showing that there was some growth between two versions of GPT (so, 2 data points, a before and an after) doesn't disprove this hypotesis. I doesn't prove it either: as I said, 2 data points aren't enough to derive a curve.

If you do look at the past growth of precision for LLMs, whilst improvement is still happening, the rate of improvement has been going down, which does support the idea that there is a limit to how good they can get.

[–] [email protected] 1 points 1 month ago

which does support the idea that there is a limit to how good they can get.

I absolutely agree, im not necessarily one to say LLMs will become this incredible general intelligence level AIs. I'm really just disagreeing with people's negative sentiment about them becoming worse / scams is not true at the moment.

I doesn't prove it either: as I said, 2 data points aren't enough to derive a curve

Yeah only reason I didn't include more is because it's a pain in the ass pulling together multiple research papers / results over the span of GPT 2, 3, 3.5, 4, 01 etc.

[–] [email protected] 2 points 1 month ago (1 children)

One or two models have increased in accuracy. Meanwhile all the grifters have caught on and there's 1000x more AI companies out there that are just reselling ChatGPT with some new paint.

[–] [email protected] 1 points 1 month ago

That's definitely valid, but just because a tool is used for scam doesn't inherently mean it's a scam. I don't call the cellphone a scam because most my calls are.

load more comments (7 replies)

load more comments (18 replies)