How to calculate cost-per-tokens output of local model compared to enterprise model API access (lemmy.world)

submitted 4 days ago* (last edited 4 days ago) by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

Recently I've been experimenting with Claude and feeling the burn on the premium API usage. I wanted to know how much cheaper my local llm was in terms of cost-per-token output.

Claude Sonnet is a good reference with 15$ per 1 million tokens out, so I wanted to know comparatively how many tokens 15$ worth electricity powering my rig would generate.

(These calculations are just simple raw token generation by the way, in real world theres cost in initial hardware, ongoing maintenance as parts fail, and human time to setup thats much harder to factor into the equation)

So how does one even calculate such a thing? Well, you need to know

how many watts your inference rig consumes at load
how many tokens on average it can generate per second while inferencing (with context relatively filled up, we want conservative estimates)
cost of electric you pay on the utility bill in kilowatts-per-hour

Once you have those constants you can extrapolate how many kilowatt-hours worth of runtime 15$ in electric buys then figure out the total amount of tokens you would expect to generate over that time given the TPS.

The numbers shown in the screenshot are for a fully loaded into vram model on the ol' 1070ti 8gb. But even with partially offloaded numbers for 22-32b models at 1-3tps its still a better deal overall.

I plan to offer the calculator as a tool on my site and release it under a permissive license like gpl if anyone is interested.

you are viewing a single comment's thread
view the rest of the comments

[-] [email protected] 3 points 4 days ago

Neat, would like to toss the numbers of my 3090 and 3080 in there.

[-] [email protected] 1 points 4 days ago* (last edited 4 days ago)

I would recommend you get a cheap wattage meter that plugs inbetween wall outlet and PSU powering your cards for 10-15$ (the 30$ name brand kill-a-watts are overpriced and unneeded IMO). You can try to get rough approximations doing some math with your cards listed TPD specs added together but that doesn't account for motherboard, cpu, ram, drives, so on all and the real change between idle and load. With a meter you can just kind of watch the total power draw with all that stuff factored in, take note of increase and max out as your rig inferences a bit. Have the comfort of being reasonably confident in the actual numbers. Then you can plug the values in a calculation

this post was submitted on 10 Jul 2025

29 points (96.8% liked)

LocalLLaMA

3361 readers

13 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago

MODERATORS

[email protected]