-4

FitMyLLM — Independent benchmarks for self-hosted AI (www.fitmyllm.com)

submitted 18 hours ago by anzo@programming.dev to c/homelab@programming.dev

5 comments fedilink hide all child comments

Check what can you use and at what rate of token per seconds would it be... It has examples of many models and quantization levels. Huge resource!

you are viewing a single comment's thread
view the rest of the comments

[-] Hexarei@beehaw.org 3 points 3 hours ago

This doesn't seem to take into account CPU MoE, which can make a huge amount of difference - Running a bigger MoE model is better than a small model that fits in your GPU if you have the CPU resources. I run Qwen3.6 (the 30b/e4b version) in MoE at around 40 tok/s on my 5070+Ryzen 9 5950X, and it's way better than that tool's recommended 9b.

this post was submitted on 03 Jun 2026

-4 points (40.0% liked)

HomeLab

209 readers

22 users here now

A homelab is a server or multiple server setup that resides in your home and where you host sevelra applications and virtualized systems for testing and developing

Its a sandbox environment where you can experience and break and fix things in with no repercussions while its down

This is a community where you can share, discuss, or post news relating to homelabs

founded 2 years ago

MODERATORS

anzo@programming.dev