759
you are viewing a single comment's thread
view the rest of the comments
[-] BlackLaZoR@lemmy.world 38 points 13 hours ago

Just to make things clear: API access to most models is charged per input tokens + output tokens. It means that the longer your conversation is, the more you pay for every new answer. Single prompt with no context and 100 tokens of answer is cheap. Single prompt with 100k tokens of context and 100 tokens of answer is NOT cheap.

Extremely long conversations with most expensive top of the line models can absolutely demolish your budget.

[-] perviouslyiner@lemmy.world 11 points 12 hours ago

does it give the full history to the LLM each time?

Last time I tried implementing something like this, it suggested to have a rolling window of history so that it takes into account your last X messages but not the entire conversation.

(I guess this is what ollama calls "context length"?)

[-] BlackLaZoR@lemmy.world 2 points 55 minutes ago

does it give the full history to the LLM each time?

It's limited to the context size supported by given model. You can give the model 100k tokens of history but if it's configured for less, it will just truncate it before processing (usually by removing oldest tokens first)

[-] percent@infosec.pub 7 points 11 hours ago

Most agent harnesses do something called "compaction." For example, here's how Pi does compaction

[-] Sabata11792@ani.social 7 points 12 hours ago* (last edited 12 hours ago)

You send the entire history for that conversation every time and likely more if its getting info from tools. If its not in the context the model dose not see it unless you have a memory system that dose something like feeding in summaries of past conversations that also takes up tokens and context. Rolling drops old messages to not reach context limits but you can lose important info or get odd results. If the history gets bigger than the context things break or slow way down.

[-] perviouslyiner@lemmy.world 9 points 11 hours ago

presumably this is why Claude periodically writes its conclusions so far into a text file that it can read later instead of having to remember everything. Sounds like an interesting approach.

this post was submitted on 29 May 2026
759 points (98.7% liked)

Technology

84998 readers
3578 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 3 years ago
MODERATORS