1
2
submitted 24 minutes ago by [email protected] to c/[email protected]
2
1
submitted 13 minutes ago by [email protected] to c/[email protected]

In modern LLM applications like RAG and Agents, the model is constantly fed new context. For example, in RAG, we retrieve relevant documents and stuff them into the prompt.

The issue is that this dynamically retrieved context doesn't always appear at the beginning of the input sequence. Traditional KV caching only reuses a "common prefix," so if the new information isn't at the very start, the cache hit rate plummets, and your GPU ends up recomputing the same things over and over.

CacheBlend changes the game by allowing for the reuse of pre-computed KV caches regardless of their position in the input sequence.

This makes it possible to achieve a 100% KV Cache hit rate in applications like RAG. The performance gains are significant:

  • Faster Time-To-First-Token (TTFT): Get your initial response much quicker.
  • More Throughput: Serve significantly more users with the same hardware.
  • Almost lossless Output Quality: All of this is achieved with little degradation in the model's generation quality.

CacheBlend works by intelligently handling the two main challenges of reusing non-prefix caches:

  • Positional Encoding Update: It efficiently updates positional encodings to ensure the model always knows the correct position of each token, even when we're stitching together cached and new data.
  • Selective Attention Recalculation: Instead of recomputing everything, it strategically recalculates only the minimal cross-attention needed between the new and cached chunks to maintain perfect generation quality.

An interactive CacheBlend demo is available at: https://github.com/LMCache/LMCache-Examples/tree/main/demo-rag-blending

3
25
submitted 2 hours ago by [email protected] to c/[email protected]
4
37
submitted 3 hours ago by [email protected] to c/[email protected]
5
6
submitted 2 hours ago by [email protected] to c/[email protected]
6
47
submitted 6 hours ago by [email protected] to c/[email protected]
7
-1
submitted 2 hours ago by Amoxtli to c/[email protected]
8
27
submitted 8 hours ago by [email protected] to c/[email protected]

Forget chatbots. Zuckerberg’s vision is much grander. He is betting that within a few years, AI will not just be answering your questions or writing your emails. It will be managing your schedule, anticipating your needs, running your home, helping you make decisions, and maybe even guiding your career. Call it Life-as-a-Service, powered by Meta.

The move is seen as a direct challenge to competitors. “The launch of Meta Superintelligence labs isn’t just an announcement; it’s a statement: Meta won’t settle for second place in AI,” commented Alon Yamin, cofounder and CEO of the AI detection platform Copyleaks. He added, “Meta and Mark clearly see this as a make or break moment for AI leadership.”

9
9
submitted 8 hours ago by [email protected] to c/[email protected]
10
14
submitted 8 hours ago by [email protected] to c/[email protected]
11
4
submitted 6 hours ago by [email protected] to c/[email protected]
12
7
submitted 8 hours ago by [email protected] to c/[email protected]
13
12
submitted 11 hours ago by [email protected] to c/[email protected]
14
37
submitted 14 hours ago by [email protected] to c/[email protected]

Catfishing just got harder in California.

15
80
submitted 16 hours ago by [email protected] to c/[email protected]

A supposed band called The Velvet Sundown has released two albums of AI slop this month.

16
12
submitted 14 hours ago by [email protected] to c/[email protected]

Exclusive: Shabana Mahmood told companies she wanted ‘deeper collaboration’ to tackle prisons crisis

17
-1
submitted 5 hours ago by [email protected] to c/[email protected]
18
10
submitted 14 hours ago by [email protected] to c/[email protected]

The suited line judges have been replaced by robots. Some argue it takes the theatrics out of the court and some players have reported difficulties with the tech.

19
6
submitted 14 hours ago by [email protected] to c/[email protected]

“I still want to do it.”

20
11
submitted 16 hours ago by [email protected] to c/[email protected]
21
25
submitted 1 day ago by [email protected] to c/[email protected]
22
4
submitted 1 day ago by [email protected] to c/[email protected]
23
213
submitted 2 days ago by [email protected] to c/[email protected]
24
38
submitted 2 days ago* (last edited 2 days ago) by [email protected] to c/[email protected]

Let's Encrypt has announced it will no longer notify users about imminent certificate expirations via email due to high costs, privacy concerns, and unnecessary complexities.

25
15
submitted 1 day ago by [email protected] to c/[email protected]

The update will install much faster than 24H2.

view more: next ›

Technology

3321 readers
363 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.


Post guidelines

[Opinion] prefixOpinion (op-ed) articles must use [Opinion] prefix before the title.


Rules

1. English onlyTitle and associated content has to be in English.
2. Use original linkPost URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communicationAll communication has to be respectful of differing opinions, viewpoints, and experiences.
4. InclusivityEveryone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacksAny kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangentsStay on topic. Keep it relevant.
7. Instance rules may applyIf something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.


Companion communities

[email protected]
[email protected]


Icon attribution | Banner attribution


If someone is interested in moderating this community, message @[email protected].

founded 2 years ago
MODERATORS