this post was submitted on 07 Dec 2024
17 points (90.5% liked)

LocalLLaMA

2292 readers
2 users here now

Community to discuss about LLaMA, the large language model created by Meta AI.

This is intended to be a replacement for r/LocalLLaMA on Reddit.

founded 2 years ago
MODERATORS
 

People are talking about the new Llama 3.3 70b release, which has generally better performance than Llama 3.1 (approaching 3.1's 405b performance): https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_3

However, something to note:

Llama 3.3 70B is provided only as an instruction-tuned model; a pretrained version is not available.

Is this the end of open-weight pretrained models from Meta, or is Llama 3.3 70b instruct just a better-instruction-tuned version of a 3.1 pretrained model?

Comparing the model cards: 3.1: https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md 3.3: https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md

The same knowledge cutoff, same amount of training data, and same training time give me hope that it's just a better finetune of maybe Llama 3.1 405b.

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 4 points 2 weeks ago

AFAIK it is still a tuning of llama 3[.1], the new Base models will come with the release of 4 and the "Training Data" section of both the model cards is basically a copy paste.

Honestly I didn't even consider the fact they would not be giving Base models anymore before reading this post and, even now, I don't think this is the case. I went to search the announcements posts to see if there was something that could make me think about it being a possibility, but nothing came out.

It is true that they released Base models with 3.2, but there they had added a new projection layer on top of that, so the starting point was actually different. And 3.1 did supersede 3...

So I went and checked the 3.3 hardware section and compare it with the 3 one, the 3.1 one and the 3.2 one.

|3 | 3.1 | 3.2 | 3.3 | |


|


|


|


| | 7.7M GPU hours | 39.3M GPU hours | 2.02M GPU hours | 39.3M GPU hours |

So yeah, I'm pretty sure the base of 3.3 is just 3.1 and they just renamed the model in the card and added the functional differences. The instruct and base versions of the models have the same numbers in the HW section, I'll link them at the end just because.

All these words to say: I've no real proof, but I will be quite surprised if they will not release the Base version of 4.

Mark Zuckerberg on threadsLink to post on threads
zuck a day ago
Last big AI update of the year:
•⁠ ⁠Meta AI now has nearly 600M monthly actives
•⁠ ⁠Releasing Llama 3.3 70B text model that performs similarly to our 405B
•⁠ ⁠Building 2GW+ data center to train future Llama models
Next stop: Llama 4. Let's go! 🚀

Meta for DevelopersLink to post on facebook
Today we're releasing Llama 3.3 70B which delivers similar performance to Llama 3.1 405B allowing developers to achieve greater quality and performance on text-based applications at a lower price point.
Download from Meta: --

Small note: I did delete my previous post because I had messed up the links, so I had to recheck them, whoops