23

Site has extremely detailed stats by day/week for every model. Programming is by far the largest consumer of tokens, and in fact entire token growth in 2025 was only from programming. Other categories very flat. It is also a category where you would pay for better performance.

IMO, its relevant to this sub in that one of the top models, minimax, fits in under 256gb, but also that the trends are for cost effectiveness rather than "the absolute best". There is a tangent insight as to whether US datacenter frenzy is needed.

kimi k2.5 being free on openclaw is a big reason for its total dominance. In week of Feb 2, minimax was only other top model to increase token usage. Opus 4.6 release seems to be extremely flat in reception.

Agentic trend tends to make LLM models disposable, since better ones are released every week, and the agents/platforms that can switch on the fly while keeping context, is something you can invest in improving while not being obsolete next month.

top 12 comments
sorted by: hot top new old
[-] SuspciousCarrot78@lemmy.world 4 points 4 days ago* (last edited 4 days ago)

I really like Claude, but the way that it chews thru tokens def cements it as a "rich man's" AI. Codex surprised me at how capable it is vs how much (little) it costs to operate. Previously, I'd been trying to use ChatGPT + web + project containers...with really sub-par refactoring results.

Tbf, I've only really used Claude Opus 4.5 and GPT Codex5.3 for code, so pardon my ignorance.

How well do open weight models like Kimi et al stack up? Can I call them via VsCodium to reason over local mirror of files on my repo? I'm hardware bound with limited compute. I've played around a bit with Open Router before, so have passing familiarity with things like TNG Deepseek R1T2, mimo-v2-flash etc.

[-] humanspiral@lemmy.ca 2 points 4 days ago

opencode is well worth having. It has a better priced Zen gateway that is limited to top models, but priced as you go, and can point to same folder/container as your other tools. Access to openrouter is useful, if only that some models are free. Antigravity is good to have for generous use of gemini. If VsCodium can't access open models, then other tools can work on same project, and you just reload files they change.

Many open models at 1/10th the cost or lower, are far better than 1/10th of opus 4.6. The popularity reflects much better value. They are especially better if not doing python/js, but functional programming, even if all models are generally bad so far. agents/skills (opencode/antigravity) for models that are strong at instruction following and polyglot software (minimax pretty impressive) actually scored better than raw opus 4.6 on my benchmark, and investing in skills/agents means promise for improving whatever model is released next week.

[-] pkjqpg1h@lemmy.zip 1 points 4 days ago

GLM-5 and Kimi-K2.5 is really good.

ArtificialAnalysis Intellegence vs Cost

[-] SuspciousCarrot78@lemmy.world 2 points 4 days ago* (last edited 4 days ago)

Woof - the axes on that chart LOL. Suffice it to say, they're all pretty dang close. Interesting. Maybe the easter bunny can bring me something with >8GB VRAM so I can actually run em locally. I'm guessing Kimi-2 eats about what...500GB+ for 128K context?

[-] pkjqpg1h@lemmy.zip 2 points 4 days ago

The real reason is LLMs are still using the same architecture and there is no breakthrough at the end of the day their intelligence will become so close to each other, when this happens they will have to decrease the prices to compete with open-weight models and even with these prices they don't generate revenue so instead of just scaling they will have to focus on optimization and innovation

[-] construct3116@sh.itjust.works 4 points 4 days ago

I think Chinese researchers are very crafty. They optimized the model for what hardware they have. I think they will surpass the west soon of not already with unreleased models

[-] humanspiral@lemmy.ca 1 points 4 days ago

US corruption means going big on Skynet to buy datacenter time, while somehow giving Nvidia oligarchy the profits of diverting H200 supply to China, so that Skynet becomes more essential as China AI is able to be served/compete better, and Skynet is more expensive to build in US. Chinese governments may be supporting the market share gains, but models are focused on being useful/value today instead of expensive Skynet tomorrow. AI that focuses on more manufacturing is going to make a society stronger than ones that make insurance/banking/other services fire more people. The datacenter overinvestment in US can both destroy AI company valuations, while still creating mass unemployment elsewhere, and too big to fail unsustainable deficit financing of Skynet as a policy response.

[-] pkjqpg1h@lemmy.zip 4 points 4 days ago* (last edited 4 days ago)

General ranking (weekly) (higlighted models are open-weight)

General ranking (weekly)

  1. Kimi K2.5 - 1.45T tokens
  2. Gemini 3 Flash Preview - 737B tokens
  3. DeepSeek V3.2 - 711B tokens
  4. Claude Sonnet 4.5 - 678B tokens
  5. MiniMax M2.1 - 454B tokens
  6. Gemini 2.5 Flash - 449B tokens
  7. Grok 4.1 Fast - 421B tokens
  8. Trinity Large Preview - 388B tokens
  9. Gemini 2.5 Flash Lite - 358B tokens
  10. Claude Opus 4.5 - 345B tokens
  11. Grok Code Fast 1 - 314B tokens
  12. Claude Opus 4.6 - 275B tokens
  13. gpt-oss-120b - 266B tokens
  14. GPT-5 Nano - 265B tokens
  15. Gemini 2.0 Flash - 175B tokens
  16. GLM 4.7 - 171B tokens
  17. Gemini 3 Pro Preview - 169B tokens
  18. Pony Alpha (GLM-5) - 147B tokens
  19. GPT-5.2 - 145B tokens
  20. Claude Haiku 4.5 - 132B tokens

Wow 46% of tokens are now going through open-weight models thats amazing.

[-] humanspiral@lemmy.ca 2 points 4 days ago

and the just the top 3 open ones are 50% in programming subsection, which I still think is most relevant to "performance/value". More than these models growth, I was impressed by downtrend in big US provider models.

[-] vermaterc@lemmy.ml 4 points 5 days ago

Who exactly is using OpenRouter? Is this used for coding? Bots? Casual conversations? Because that could tell what exactly those top models are good at.

I wish GitHub Copilot shared such data. To see what models do programmers use for work.

[-] pkjqpg1h@lemmy.zip 4 points 4 days ago

OpenRouter is basically the place for third-party AI stuff: tools, research, benchmarks...

Sure, it won't tell you what regular users are doing, but it shows where professionals actually spend money. And since it's pricey (seriously, once you're used to free ChatGPT or Gemini, paying hurts :D), it reveals which models are actually worth it.

[-] humanspiral@lemmy.ca 4 points 4 days ago

I think it's the largest aggregator/portal of models. 10T tokens in week of feb 2. there's a list of Openclaw, claudcode, kilocode are among the apps that opt in for tracking (listed at bottom of page). https://openrouter.ai/apps?url=https%3A%2F%2Fkilocode.ai%2F Since Jan 29th, top 2-3 models have been Chinese every day. minimax, glm k2. ponyalpha is glm5 preview.

opencode and antigravity and most other coding apps have an openrouter "gateway" connection. You can also connect by web chat. Some models get promotional/free periods or tokens for partners which can boost stats significantly, but that includes fast/flash versions of US models.

It's the only dashboard for AI. And exceptionally exhaustive/interactive. Even if open source share rise is result of promotions, its significant that token counts for US models is trending down. Models are no longer sticky, even if for non programming applications, mindshare publicity can get the casuals.

this post was submitted on 11 Feb 2026
23 points (100.0% liked)

LocalLLaMA

4416 readers
20 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Rules:

Rule 1 - No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Rule 2 - No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Rule 3 - No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Rule 4 - No implying that models are devoid of purpose or potential for enriching peoples lives.

founded 2 years ago
MODERATORS