This doesn't seem to take into account CPU MoE, which can make a huge amount of difference - Running a bigger MoE model is better than a small model that fits in your GPU if you have the CPU resources. I run Qwen3.6 (the 30b/e4b version) in MoE at around 40 tok/s on my 5070+Ryzen 9 5950X, and it's way better than that tool's recommended 9b.
Interesting, I just have 8GB VRAM unfortunately. So can't run anything particularily useful for mye purpose 😔 The Gemma 4 E4B is quite good, but id like to run the 31B one
This feels useless. At least for homelabbers, ollama's model page tells us more useful info. And if a newbie goes there they'll be misguided.
Also, there's a lot of people who use CPUs, they don't list anything about them at all. Like I cannot fit Gemma 4 on my GPU, but ollama offloads it to CPU, and even with small GPUs you can get good performance.
And for nearly all small models, it recommends RTX 5060. Which is a very stupid choice.
What do you mean by „small gpu“?
I have not yet tried that, do you have any guidance? Or does „small gpu“ still mean >500€ GPU?
By small, I mean GPUs like outdated ones, laptop GPUs, or like GPUs with only 4GB or 6GB of VRAM.
HomeLab
A homelab is a server or multiple server setup that resides in your home and where you host sevelra applications and virtualized systems for testing and developing
Its a sandbox environment where you can experience and break and fix things in with no repercussions while its down
This is a community where you can share, discuss, or post news relating to homelabs