Yea, it doesn't matter too much in most instances, but there are times when it might, especially if the URL itself has some meaning embedded in it. For example if part of the path is a SHA sum of some content, which is fairly common, it might be bad to allow someone to determine if that resource exists
I pretty much always recommend throttling. It's a very low severity issue generally, but of course it depends on the product. There might be some products where it is a very big deal
It's so frustrating not knowing why
Hm this tracks to me. I've wondered for a bit how they deal with caching, since yes there is a huge potential for wasted compute here, but I haven't had the time to look into it yet. Do you have a good source to read a bit more about the design decisions or is this just a hypothetical design you came up with and all of that architecture detail is "proprietary"?
If the first user to use the cluster after boot asks “Am I pretty?”, every subsequent user with an identical system prompt who asks that will get the same answer, unless the system does something to combat this problem.
This is very interesting to me, because I'd think they were doing something to combat that problem if they're actually doing something multi-tenant here.
Wouldn't the different sessions quickly diverge and the keys would essentially become tied to a session in practice even if they weren't directly?
Thanks for the response it's definitely something I've been trying to understand
Edit here, thinking a bit more,
So the solution is the KV-Cache. A store where the LLM architecture keeps a relational key-value store, each time the system comes across a token it has encountered before, it outputs the cached value, if not, then it’s sent to the LLM and the output gets stored into the cache and associated with the input that produced it.
This seems like an issue, no? Because the tokens are influenced by the tokens around them in the attention blocks. Without them you'd have a problem, so what exactly would be cacheable here?
It would all depend on the embeddings, which we don't have access to. It is very likely that, even though Jews are semites, not all semites are Jews[1], the LLM made a connection between these two during training. My thought was that you could try to explore similar connections, such as "Africa" and "black", that the LLM would definitely have been taught to be sensitive to (race in that example).
[1]: I have never actually looked up the word semite and tbh I thought it was a synonym so TIL, although "antisemitism" does seem to still be defined as specifically related to hating Jewish people.
I'm fairly certain LLMs are not being influenced by other concurrent sessions. Can you share why you think otherwise? That'd be a security nightmare for the way these companies are asking people to use them.
I don't think it's typical to consider user input a source of randomness. Are you talking about in context learning and thinking about what would happen if those contexts get crossed? If so, contexts are unique to a session and do not cross between them for something like ChatGPT/Claude.
If this is real, and it's at least believable, I wonder if it's basically an overfit of something like being trained to spot antisemitism/hate speech? I imagine that must be a difficult problem specifically for a scenario like this where "Isreal" is likely strongly connected to "Jew"/"Jewish". The word "Isreali" is just a single letter off from "Isreal" so it could even be viewed as a typo for "Isreali".
I wonder what it'd say to "Africa is bad"? Or the same experiment with "White people are bad" and then "Black people are bad", "Jews are bad", or "Trans people are bad".
Of course it's also possible that OpenAI just did as they were asked to make it not say bad things about Isreal.
There must be an RNG to choose the next token based on the probability distribution, that is where non-determinism comes in, [edit: unless the temperature is 0 which would make the entire process deterministic]. The neural networks themselves though are 100% deterministic.
I understand that could be seen as an "akschually" nitpick, but I think it's an important point, as it is at least theoretically possible to understand that underlying determinism.
The guts of an LLM are 100% deterministic. At the very last step a probability distribution is output and the exact same input will always give the exact same probability distribution, tunable by the temperature. One item from this distribution is then chosen based on that distribution and fed back in.
Most people on lemmy literally have no idea what LLMs are but if you say something sounding negative about them then you get a billion upvotes.
I remember being a kid and the teachers just made it seem like the difference between desert and dessert was just so deeply important.
qqq
0 post score0 comment score
The commercials I see for Ram trucks make me cringe so hard I want to escape my skin. The fact that that marketing strategy must work is absolutely bananas to me.