-12
FitMyLLM — Independent benchmarks for self-hosted AI
(www.fitmyllm.com)
A community for those that would like to get away from Google.
Here you may post anything related to DeGoogling, why we should do it or good software alternatives!
Be respectful even in disagreement
No advertising unless it is very relevent and justified. Do not do this excessively.
No low value posts / memes. We or you need to learn, or discuss something.
!privacyguides@lemmy.one !privacy@lemmy.ml !privatelife@lemmy.ml !linuxphones@lemmy.ml !fossdroid@social.fossware.space !fdroid@lemmy.ml
While benchmarking token throughput is useful, true self-hosting viability often depends on memory bandwidth bottlenecks rather than raw compute, especially for quantized models. Have you evaluated how different quantization levels impact inference latency on consumer-grade GPUs compared to the reported token-per-second figures?