-4

Check what can you use and at what rate of token per seconds would it be... It has examples of many models and quantization levels. Huge resource!

you are viewing a single comment's thread
view the rest of the comments
[-] Hexarei@beehaw.org 3 points 3 hours ago

This doesn't seem to take into account CPU MoE, which can make a huge amount of difference - Running a bigger MoE model is better than a small model that fits in your GPU if you have the CPU resources. I run Qwen3.6 (the 30b/e4b version) in MoE at around 40 tok/s on my 5070+Ryzen 9 5950X, and it's way better than that tool's recommended 9b.

this post was submitted on 03 Jun 2026
-4 points (40.0% liked)

HomeLab

209 readers
22 users here now

A homelab is a server or multiple server setup that resides in your home and where you host sevelra applications and virtualized systems for testing and developing

Its a sandbox environment where you can experience and break and fix things in with no repercussions while its down

This is a community where you can share, discuss, or post news relating to homelabs

founded 2 years ago
MODERATORS