22
submitted 2 years ago by [email protected] to c/[email protected]

Is it just memory bandwidth? Or is it that AMD is not well supported by pytorch well enough for most products? Or some combination of those?

you are viewing a single comment's thread
view the rest of the comments
[-] [email protected] 7 points 2 years ago

Ultimately, it is all about data throughput to the CPU caches because tensors are so large. The M2 claims a 128 bit bus. The instruction support for ARM built into llama.cpp is weak compared to x86. If you want to run big models that require lots of memory, without spending five figures, find a Intel chip that supports AVX-512 and has support for 96+ GB of ram. AVX-512 and the related sub commands are directly supported in llama.cpp and that gets you 512 bit instructions. Apple can't match that.

If you want a laptop, get something with a 3080Ti. It needs to specifically be the Ti version. This has 16GBV ram and came in several 2022 models.

Run Fedora with it. They have Nvidia support including a slick script that builds the GPU driver from source with every kernel update automatically, and keeps secure boot working all the time.

[-] [email protected] 2 points 2 years ago

I run exllama on a 24GB GPU right now, just seeing what's feasible for larger models -- so an intel CPU with lots of RAM would in theory outperform an AMD iGPU with the same amount of ram allocated as VRAM? (I'm looking at APU/iGPUs solely because you can configure the amount of VRAM allocated to them.

[-] [email protected] 3 points 2 years ago

I'm pretty sure it is not super relevant. The amount of vram in a GPU is different than the amount in a CPU. The system memory with x86 is mostly virtual bits. I haven't played in this space in awhile, and so my memory is rusty. The system memory is not directly accessible by an address bus. It creates a major bottleneck when you need to access a lot of information at once. It is more of a large storage system that is made to move chunks of data that are limited in size. If you want more info read about address buses and physical/virtual buses: https://en.m.wikipedia.org/wiki/Physical_Address_Extension

In a GPU, the goal is to move data in parallel where most of the memory is available at the same time. This doesn't have the extra overhead of complicated memory management systems. Each small processor is directly addressing the memory it needs. With a GPU, more memory usually means more physical compute hardware .

If you ever feel motivated to build vintage computing hardware like Ben Eater's 8 bit bread board computer project on YouTube, or his 6502 stuff, you'll see a lot of this first hand. The early 8 bit computer stuff is when a lot of this memory bus and address space was a major design aspect that is much more clear to understand because it is manually configured in hardware external to the processor.

[-] [email protected] 1 points 2 years ago

As per the link (YouTube) in the other thread, it seems like iGPU + increased allocation of VRAM is better than using the CPU, though it also seems APUs max out at 16GB. Maybe something AMD can improve in the future then...

load more comments (6 replies)
this post was submitted on 25 Aug 2023
22 points (100.0% liked)

LocalLLaMA

3361 readers
15 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS