677

Anthropic/OpenAI may be spending more than $1000 for every $100 you pay them (ea.rna.nl)

submitted 1 month ago by Trilogy3452@lemmy.world to c/technology@lemmy.world

177 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Wildmimic@anarchist.nexus 3 points 1 month ago

I like my local LLM too, but it's one thing to utilize my existing VRam for a model that fits in there for fault tolerant tasks, and a whole other thing to utilize current frontier models which rack up an energy bill comparable to running a group of space heaters in a building which had to be designed for them, while not even having a guarantee that the output isn't useless.

[-] FaceDeer@fedia.io 1 points 1 month ago

Right, which is why I said 90% and not 100%, and called out the challenge of deciding which tasks to send to which AIs. A lot of the interesting work I'm seeing in AI right now is in the agentic frameworks and harnesses that call the LLMs rather than just the LLMs themselves, these are the things that will break big complicated tasks down into more focused sub-tasks that cheaper LLMs can handle.

Given how some of the big providers like Gemini and Anthropic have been cranking up their API costs in recent weeks I expect we'll see a lot more effort being put into rolling those sorts of features out.

[-] Wildmimic@anarchist.nexus 4 points 1 month ago

It's not even where to send it - you cannot predict how much any given task is going to cost you in tokens, which is the deciding factor in which model to use. The "cranking up" part has not even started yet, and we already have stories like Uber which blew through their complete AI budget for the year, what was it, 2 months ago? Uber is very pro-AI, so that budget was probably very generous. And to top it off, I haven't seen or heard about anything new at Uber that would be even worth mentioning.

If you read the article, this project started from a clean slate and is 40k lines of code, so it's peanuts in regards of complexity compared to what is out there in companies, and the author had to use the maximum power available to him to let Claude keep up. There still was no guarantee that the output was useable (and there can't be such a guarantee, since hallucinations are a statistical fact, increasing in occurrence with smaller amounts of training Data available).

If you extrapolate this to an average IT stack, which has quirks and issues that are unique to it, you will never get anywhere you wouldn't get by employing more engineers, who will get better over time and have fixed costs you can budget.

Remember, this is the "killer" application for LLMs. It looks a lot worse in EVERY other area except probably translation.

[-] FaceDeer@fedia.io 1 points 1 month ago

You can predict how much a task will take in tokens. The accuracy of the prediction may not be perfect, but if you can ballpark it that can tell you a lot about what models to make use of.

Also, not all tokens are the same. Different models require different amounts and kinds of computing power to run. Using a very large context costs more per token because you need a computer with a lot of memory to fit it all. If you need it fast that's more expensive than if you an take your time. Does the task involve vision or audio? Does the context need to be saved for an ongoing chat? Does it need to wait for tool calls to return between rounds? There are a lot of variables that can be tweaked to vary the cost that an AI call will take, and a lot of those variables can be predicted without having to actually run the whole thing first.

The "cranking up" part has not even started yet, and we already have stories like Uber which blew through their complete AI budget for the year,

This is exactly what I'm talking about. Current LLM usage patterns tend to be pretty inefficient because people just thow tasks at the biggest and bestest models. Those models handle them, sure, because they're the biggest and bestest. But most tasks don't need that much.

I've used coding agents a fair bit along with the various other AI applications I've fiddled with, and often I ask them to do things that are dead simple. Create a function to sort some data and select whatever fits certain criteria. Add type checking to a file. Create a unit test for a function. Stuff like that could easily be done by a small local model, but the coding agent sends it off to Opus or whatever just like every other task. That can change.

There still was no guarantee that the output was useable (and there can't be such a guarantee, since hallucinations are a statistical fact, increasing in occurrence with smaller amounts of training Data available).

I don't think you've used modern coding AIs much.

Or, for that matter, worked with human coders.

Remember, this is the "killer" application for LLMs.

There is no one single "killer" application for LLMs. They're about as general a computing platform as you can get.

[-] Wildmimic@anarchist.nexus 2 points 1 month ago

I used to think like you, and I am still pro local LLMs - I use them as tutors for areas I don't know much about, and since I use the output just as a guide and implement it on my own I quickly realize if something isn't right.

We will see - when OpenAI and Anthropic rush towards IPO this year, which was made very likely because SpaceX has upped the tempo - what the real costs are. If this article and others I've read in the last year are correct, and the prices have to go up x10 to break even, then we are in for a wild ride. I'm only grateful that for now they don't get lumped into the index funds.

this post was submitted on 07 Jun 2026

677 points (98.6% liked)

Technology

86707 readers

3645 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 3 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws