[Guide/Feedback] We are back at Llama times in not a good way: How to work around it (Redux) (lemmy.world)

submitted 1 month ago* (last edited 1 month ago) by justpassing@lemmy.world to c/perchance@lemmy.world

8 comments fedilink hide all child comments

First of all, belated Merry Christmas and a Happy New Year to everyone. I hope everyone had great holidays and may this year be fruitful for everyone.

Formalities aside, as the title implies, we are at a point where the model is showing the worse of two worlds. That doesn’t mean that development has stopped. If anything, there is a handful of things we should praise the dev as the following things have been corrected and if they happen, they are just outliers.

English in general and dialogues no longer resort to degradation/caveman speak.
Bias is no longer to a single type of personality or story.
Summaries are comprehensible and untainted.
Manual railroading (i.e. unsticking the story) is easier.

That being said, the obvious problem which has plagued us since release is still there and it’s getting worse by the day: the model can latch into anything, create a pattern, and regurgitate it on a nonsense word-salad refusing to continue the story. But as last time, I’ll try to explain both how to work around this and give some thoughts to anyone interested. This is pretty much a continuation to an older post, which is already obsolete in the sense of “how to work around this”, but the analysis and conclusions, ironically, hold until today.

This time however, I would like to address the userbase first on the following, since despite the contents of this post and the previous, I understand the dev position on this and how much scrutiny he may get on different platforms. So the pressure to provide a quick fix to a menial issue may open the gate for greater problems, and that’s something I’ve not seen addressed anywhere.

Things no LLM can do accurately

In summary, due to how LLMs and other neural models are created, the following things will never be accurate.

Basic logic (i.e. proper solution to a logic puzzle or recalling positions, matching, order, etc.)
Spatial awareness (i.e. how things are positioned not only in a map, but also who carries something or where something is stored)
Math (i.e. operations that are not common, and even counting past a threshold)
Filler words (“white knuckles” is a prime example of this, there are many more and even if one is swatted away, another will take its place).

As you may see, most of these are logical problem than even if you feed the model enough context, it will make mistakes. Again, this is due to how neural networks work, as they look for “matches” to the last input, and there is no guarantee that the logical answer is the one with more likelihood to appear due to the training data.

The same happens with the filler words. And not only them but also repeated constructions (more on that later), as this is a natural phenomenon in language. For example, in this post alone, one could find some bias towards me using certain phrases and constructions favoring them over others. That is not to say this is wrong, but all models will have a distinct writing style that will be identifiable with absolute ease despite the dev best efforts to hide it or make it dynamic.

Therefore, asking to “fix” things such as “why does the model not remember where am I standing” or “why does the model ‘sings off-key’ when singing” is not something worth “fixing”, as these things, while annoying, can be addressed by the used by editing or removing. Even left unchecked and ignored and there will be no lasting consequences.

There are bigger demons that do need to be addressed, and this time before explaining “why”, first I’ll go to how you, the user, can work around this and have a semi-pleasant experience until your patience runs out.

The problem

You may have ran into the following at least once.

User: *While working at McDonalds* The ice cream machine broke.

Char: *Ears perking at the mention of the ice cream machine* Again? *Turns to face User, slamming his hands on the table hard enough to leave prints* Tell me User, is this the third time this day? *Flails his arms* Though I suppose we can’t do anything about this anymore! *Eyes widen at the realization* Right now Wendy released a new Choco chips cone, while Pizza Hut reinvented the Apple Hut! Years of customer loyalty gone to the drain! *Gestures vaguely at the ice cream machine* Just… just look at this! The ice cream is crystalizing! Frost signatures decaying as we speak! But maybe, just maybe, we can use it to our favor. *His grin turned mischievous* We can use this as a feature! Make it that we present this as a new flavor! This is not ice cream anymore, this is culinary physics!

Granted, this is an exaggeration, but you may see several of the problems in this case, we’ll go one by one as usual.

The return of caricaturization and its context

In the last post this was something to watch out and fear, but now it is something that can be used and exploded if done correctly. Llama used to have a single story format and a single character to who which you would put “a silly hat” and pretend it is a new one while the mannerisms and overall personality was constant. This worked well because the driver was the story and said “character” was all encompassing enough. The new model in the current state has a “cast” of characters, some of them who can only exist in certain contexts.

Without going on much detail on the “bestiary” of these, you may have noticed that depending of the traits you give your character, you may get a set writing style for each. E.g.

Char: *Her grin widened impossibly* Ohhh User~ *Draping her arms around User shoulders like a scarf* But you know what would be fun?~

This may happen if you give your Char the “mischievous” or “playful” trait, and this one can exist for in nice contexts or where the world is awful, changing only the actions while the personality remains. This is not true for all possible characters, as one with the “timid” and “gentle” traits would not keep its personality if the world is awful.

Consider this an update on the prior “manic personality” problem. Prior, the model would “randomly” try to change the personality to fit the setting in what would it deem logical, now once a setting and personality is set, it will try to stay on that no matter what. Changes can still exist, but within what is “reasonable”. For example, let’s say you are stuck in a point with a sarcastic, passive-aggressive Char who would only complain about everything. In this situation, the world around you will reflect this giving logical reason for your Char to complain. If you really want to force a personality change, or a setting change, you need to account for both. You can’t have a happy-jumping all over the place Char in a depressing world, or better said, it will fall apart because the model won’t let it stick and it will morph into something unwanted.

This is the extent of “character development” you can have. Let’s say you start with a depressing character who you wish to eventually gain a spine. The way to achieve this would be to follow this path.

Depressing char, depressing world.
Depressing char, manually introduced easy task/work/chore.
Timid clumsy char, working its way on the set task.
Clumsy char, increasingly demanding task.
Clumsy char succeeding by manual/artificial intervention, demanding yet rewarding world.
Confident yet slightly clumsy char, rewarding world.

This would be a way to achieve a full setting transformation, and notice that the heavy lifting resides on you adding the things that manually change both the setting and the Char personality. If you let the model handle this on its own you may have it leading you into absurd and frustrating situations to then settle on a setting and never moving past it, latching on repeating patterns (more on that later).

“Let’s not get ahead of ourselves”

Ironically, while this annoying catchphrase of Llama has not returned, now for once it is your responsibility to stop the model on its tracks before it escalates things into lunacy. The “impossible stakes” problem it is still persistent even if it is not the default anymore, and yes the “deus ex machina” is still a problem so trying to solve things when you get a world-ending scenario only introduces problems.

Luckily, detecting this is very easy as like before, you can “cage” the scope of a problem with reminders, and even without them things will stay reasonable unless you let the model hallucinate new threats on top of existing ones. If the stakes are already high, it is still possible to deal with this, but it may turn annoying as the Char will be likely to reject your answer to the problem, and the model and Narrator will even discard a solution that Char proposed. Rerolling is the wisest approach here, as this is just a case of pure chance, but it can be frustrating at times.

Curiously, the opposite also may happen now, which was a Llama pet peeve, the “shallow resolution” issue. This pretty much means that the problem will magically solve itself entirely on its own just by pure will without intervention, or even in the background. Keeping a proper balance of these aspects can turn tricky and unrewarding, but it is what we got and with effort it can be solved manually.

Now, there are two instances of escalation you should avoid like the plague for your sanity.

The Marvel/DC “explanation” problem

Previously I warned that sci-fi driven stories would be impossible due to the “word salad” problem and the model obsession with vibration physics and quantum mechanics. Today it is possible, but not advisable at all.

Similar to the original example provided and the previous guide, the “resonance, crystallization, signature, harmonics, vibration, probabilistic, superposition” and similar causes the model to try generating an outlandish explanation for literally everything, effectively killing your Narrator and turning your Char into a parrot repeating things over and over without doing anything of substance.

If you really need a sci-fi or remotely technological setting, you can do it, but as soon as you see any of these words or an “explanation” of something, cut it, no replacement whatsoever. As the model is past the “caveman speech” phase, now cutting text with no replacement is a viable strategy to keep moving forward.

The Disney Fantasia problem

This is very similar to the last one, but instead of being a family of words to watch out, this is more a situational problem when dealing with magical or “whimsical” settings. What will happen this time is a “subplot” around whatever magical critter (often a rodent) or some inanimate object gaining sentience. This was something existing in Llama, mainly in the no-prompt version of AI RPG with the “Whimsyland” story, but now it can happen everywhere from the nothingness if your setting allows magic or similar. It goes like this.

A character capable of magic materializes something like a cup of tea from thin air.
Said cup starts doing things on its own, like moving or swirling.
If this is a conversation, the cup will mirror the conversation (i.e. you and this character discussing math, the cup will start solving equations).
The cup will invite other objects to do whatever it is doing, escalating the setting into Disney Fantasia.

Another case could be this.

Narrator: A mouse peeked out of the hole, looking at Char warily.

Char: Uh… User. This mouse just gave me a receipt?

Cue 5 outputs later

Narrator: The mouse set an office on the pizza box, putting a plaque with its name and wearing a hat made of a post-it. It started auditing the apartment finances with eerie precision.

In both cases, the solution to avoid this is to just eliminate the first mention of the creature or object in question when it does something out of the ordinary. While in theory it is cute for this to happen in the background, the practice is that the model will not stop referencing and escalating this, refusing to move forward this curiosity.

Be wary that this may happen in conjunction with the problem of things being “quantum”, introducing a whole mess that will be near impossible to clean up later.

Patterns

This is the crux of this entire post, and something that was warned in the past, yet, not only unsolved but turned worse, and while there is a more “technical” way to deal with this today, it is still an uphill battle.

As stated in the previous post, everything can weave a pattern, you as the user, your task is to watch out for anything that looks vaguely similar to the past five outputs. If you let a construction nest for long, it will take root, and while there are ways to unstick it (more on it later), ideally you don’t want them to plague you on a scene that is unresolved.

However, there is some preference of the model when generating an output, so you can outright reroll or edit one of these repeating constructions in dialogs:

Tell me …
Though I suppose… (or Though + similar)
Maybe, just maybe…
Should I…
Ohhh …
? Try
, always/never
It’s/This is no , it/this is

And those are dialog exclusive, as narration exclusive go:

with unnecessary force.
<pulled, tugged, grabbed something> with surprising strength.
resembling something dangerously close as .
with renewed urgency.

And this is without getting on the short filler phrases such as “knuckles white”, “hum a tuneless melody”, “eyes gleam mischievously”, “grin impossibly wide”, “arms flailing”, or similar many others.

What is difficult here, is that on the void none of those constructions are “wrong” nor they can be eliminated with no consequence as Llama’s annoying catchphrases such as “we are in this together” without altering the context. And again, letting any of those or similar being repeated in a five output window is dangerous as it will lock you into a scene that at most will “escalate” in the sense of adding things for description, but never moving forward.

For this approach, the best is reroll until you get something “fresh” compared to the last outputs, or outright manually write. It is manageable, but this factor alone puts you at edge when dealing with the model turning every run into a “debug” mission.

Then again, the reason I placed the tittle as is, is not just to draw comparisons with the old model. There is a larger metagame that aid you deal with the current model that worked in Llama times. And along with several demons that returned (more on that later), the strategy to get the best of this model, as well as its expectancies, is akin to the past.

The metagame

I never did a proper guide on how to deal with Llama in the past, but for what I gather, it was a model that stuck for so long that there is probably a lot of documentation on how to get it going, so probably there are better sources, but today with this model, despite being (allegedly) DeepSeek, this works.

Descriptions and settings

Be terse, my suggestion on “long full-line descriptions” in the last guide is void and null today as the “caveman speak” and “word salad” problems are gone. Now it is advised to describe things in a minimalistic, almost one word kind of deal. For example.

Personality: Cold, calculating, no-nonsense, pragmatical.

Remember the “elephant in the room” problem. Whatever you put in any description WILL appear somewhere as soon as the model decides it is relevant of acknowledging. This is not to say that complex personalities are still off the table, but they will obey the principle of “caricaturization” described prior, so under the assumption that a Char won’t manifest all of their range of emotions in a single output, it is best to use what you strictly need and nothing else. Same goes for unnecessary detail in things such as clothing, because then the model will take it as an invitation to describe it in a flowery way and never move forward, again, murdering your Narrator which is on death watch since the run start.

Under this principle, there is little to no need to describe the characters you are using, as it is implied that you, the user, will input everything manually for this character. Whatever you place there will permeate in other characters and the whole setting, making it change the story direction in ways that you may not desire. Again, remember the “elephant in the room” problem.

Pacing

The new model still lacks the concept of pacing, and it may solve a scene either never or immediately due a “deus ex machina”. However, contrary to how it started, it is bounded by your character behavior and the world setting. Meaning that whatever story and goal that aligns with this setting may flow unless you run into a pitfall caused by a pattern or any of the problems stated prior.

This introduces the problem of “how fast things can be solved”. In Llama, scenes were often too slow for the taste of many, requiring up to 20 outputs to get something thoroughly done and solved. This new model is more delicate on that matter, as a scene not solved in about 5-10 outputs is very likely to drag forever until “pressing the big red button” (more on that later). Likewise, you may need to keep it busy for more than two outputs or the problem will be magically solved. Essentially, to keep a run fresh, it is necessary to be moving constantly, never resting on a scene.

Something that is prone to fail in this new model is “planning”, as if you have a scene to coordinate with your Char or other NPCs prior to deal with a problem. The reason for this is because the model will need to tell you everything that is wrong with whatever you come up with and explain all that is happening, essentially forcing to tackle all action involving scenes directly. Dialog mixed with action is a whole can of worms, worth not touching yet (more on this later).

Reminders as railroading

More than often the model will give you an illogical solution or react to something in a way that makes no sense. As stated in the beginning, no model is completely logical, so when dealing with layout traversing, object carrying, or things that require logical skills, it is better to have the reminder in the input. Sort of how AI RPG implements it. Granted, this will work per scene and should be deleted once the issue or scene at hand is concluded.

Product/Project development

This was a cardinal sin in Llama and it is back. You MUST NOT let your Char design a “product” or plan an event, activity, business or similar. What this will cause is to obsess your Char about this particular “thing” starting to add ideas and suggestions over it pretty much forcing the entire world to circle around the idea and never the execution. Even after the “product” is developed, the problem solved and all, your Char will keep referencing it and trying to push you to it, as well as the world around you.

The way this happens is insidious, and you may want to delete the progression as it happens. This is an example.

User: Let’s make a pizza.

Char: How about we put pepperoni on it?

User: Sure.

Char: And, could it also have mushrooms? Maybe bell peppers cut in shape of ?

Once this nests, even if you forcefully exit the scene, the whole world will circle around it. There are ways to get rid of this later as the “big red button” approach, but for the time being the best is to outright avoid this direction on a story.

The “big red button”

You may guess what this is hinting, and yes, if for some reason you REALLY want to keep going but you got to a point where your run is going in circles endlessly, unable to progress, with a static world, a Char with a manic personality and flowery incomprehensible descriptions of everything around you while non-sentient objects dance around you, there is a solution. “Kill” your User and Char.

What I mean by this is that you can forcefully add a “subplot” to take over the “main cast” and proceed from there in the existing world in a way you only deal with one vector of the problem. I.e. a faulty world, before giving it back to your main cast.

The way this works is simple, and it also worked in Llama. Create a character you’ll use with absolutely NO description, and make it interact with a newly made, also never described, NPC. The model alone will fill the gaps using the “broken world” as a reference, but since it will have nothing to reference this new pair, it will allow for change with increased ease effectively allowing you to cleanup before returning to whatever you considered the main plot.

Personally, this worked flawlessly to get moving runs that spiraled into nothing past the 500kb threshold in the current state of the model at the time of writing this guide. The only limitation for this method is actually your patience, as it comes a point where having to keep a track of all what was mentioned beforehand coupled with how far you must go for a run… it is just not worth it at all.

Why all this even happens in the first place after so long?

Before anyone complains, this is not another long post disguised as Llama propaganda. After having to deal with the current model for so long, I finally see what dev sees in it, and there is evidence of it working semi-flawlessly above everyone’s expectations as proven in this post before falling into one that gets into “dementia mode” regurgitating everything with no direction as early as the third input. Personally, that is evidence that indeed the current model which (allegedly) is DeepSeek, COULD provide an experience akin to what Llama provided, even if it was capped at 1Mb before being unstable.

It is almost shameful to admit, but even this model when it was ultra-aggressive, it could carry a coherent story, albeit being comparable to torture-porn, until the summaries caused it to enter “dementia mode” and pretty much force the model to run in circles. Today, without excessive care, it is possible to run endlessly in circles at the totally pitiful mark of 50kb, absolutely miniscule compared to a peak performance of 1Mb in this same model.

Again, the reason for the title, and something that I scratch my head trying to reason why happens, is that indeed we are stuck with a model that takes several aspects of things Llama did that where not likeable, on top of the problems this new model carries, creating a sort of hybrid that gives a decent head start, but falls apart in a minute. It’s true, with the guidelines I gave it is possible to keep it going endlessly, especially keeping the “big red button” approach as a last effort, at some point one starts asking why even use services such as AI RPG, AI Chat or ACC. In fact, in these three there is a degree of control while generators such as Story Generator get the worst end as they are a “Narrator only run” which perishes after the third input.

And this time, I have a reasonable explanation on all this phenomena. Originally, I wrongly accused Llama to have certain obsessions and latch on terms such as “whispers”, “fractures”, “kaleidoscopes”, “clutter” and so on. Turns out these are not Llama exclusive, nor are they present by default on any model. Yes, they are on the database of the model, but the reason they existed and plagued us in the past and why they plague us now with a new family of nouns, adjectives and pseudo catchphrases, is due to the fine-tuning training data. I.e. what the dev is feeding the model to do what it does now.

Evidence of this claim

Veterans of the times of Llama may recall that a no-prompt run in AI RPG would immediately take you to “Whimsyland” and variants. A run where the world was Charlie in the Chocolate Factory with anthropomorphic animals singing sunshine and rainbows were your objective was to get some mystical artifact for a festival. Likewise, a more rare case was a blank run in AI Chat where Bot and Anon where introduced as heroes of a fantasy world about to embark in a dungeon, again, to get some mystical artifact. Other generators with the default settings let you into a “default” run that circled a common theme, that ended being annoying as it will eventually “breach containment” and permeate in a custom run having “whimsy” elements where undesired and similar.

If we try today, at the time of writing this guide, you may obtain this from AI Chat.

As seen here, there are “whimsy” elements such as talking animals as in a bootleg version of Looney Tunes or over the top situations that escape slapstick comedy and enter the realm of surrealism for the sake of strange. This is a mirror of the “Marvel/DC explanation problem” and the “Disney Fantasia problem” as the model when prompted “write a story” or “write an adventure” will default to bring those elements due to the training.

I would like to remind you all, with this same model this was not always the case. When the model was new and ultra-violent, the default AI RPG run was “eldritch creatures plague middle earth” and “cyberpunk but the men in black will kill you”. While I don’t have a screenshot or log to validate this claim and the model will no longer do that, I do hope that someone detected this when the model was new.

This is important to know because it shows where the bias of the model is, so everything that is “default” for this model becomes terra non grata. For example, originally sci-fi runs were unbearable, today both sci-fi and magic oriented runs are unbearable unless you walk in eggshells.

And this brings me to the main point of this post considering the progression of the model consistency along time. It has become more focused compared to release, but the point of breaking has been reduced per update. In my old guide I promised 1Mb. Today with no counter-measurements, runs may die before 100kb. At the current rate, the next update will make 50kb a feat, even with the “big red button” strategy as a tool to keep going, it is extremely annoying to do a proper story that is not something that finishes entirely after 20 outputs. And it is indeed in my opinion each update which pushes the stability range lower and lower.

The “patching” approach and its problems

There is also a reason why first I addressed the general public on what things are unreasonable to ask for, and will be a problem no matter the model, no matter the refinement. It is my belief that a handful of updates in the model are a reply to the outrage of the community, such as “the model is too evil”, or “the model keeps forgetting this”, or even more common “what’s up with knuckles turning white?” Attempting to patch these things reduce the model capability by over-focusing it into whatever the new training data is forming new obsessions, that unavoidably end in the death spiral that is having the model running in circles.

This is not a DeepSeek exclusive problem, Gemini deals with this a lot since it is fed new training data using Google data farms which lead to take several social media posts as “normal human behavior” causing things shown here. While these make for fun memes, a similar effect is happening with the model used in Perchance as it is increasingly over-trained on the existing dataset to the point of being loopy.

This is also a very important aspect to consider, Llama 2 was a “dumber” model, so getting it react as the dev intended required a humongous effort and re-training. Modern models seem to be more brittle in the sense that a small nudge change their scope greatly, so the approach of re-train a pre-trained model over and over is leading to the “Llamification” of the current model while reducing its “intelligence”. I’m afraid to say this was also predicted in the previous guide and it would be the cost of hyper-training. And this is evident as even when the model was atrocious to the standards of many, it received praise on the grounds of “remembering better”, “being more coherent”, and “not mixing up description”. This is now lost, like how Llama struggled with these.

A small conspiracy theory

If we go under the presented assumption that all the problems we see today and prior with this model and even Llama, there is a reasonable explanation on why the model was horribly violent, dramatic, and over the top when it was introduced. I believe that the process to jailbreak any of these models in order to make it produce content they are not designed to do from the get-go (e.g. violent content, drug and sexual references, polemic topics and others) is to present it with cases of outputs for the inputs that demand these cases. This is not totally true as while a public version of DeepSeek may at first refuse you a graphical description of something hideous, the separate wording for it exist on its database, and this is why workarounds at frontend level exist.

To me, patient zero of the original madness was the training data that was used to jailbreaks this model, in particular the Old Man Henderson story. For those that don’t know, this is a Call of Cthulhu run where things are all over the top as a GM attempts to murder his players in gruesome and deranged ways, for him to fail horribly as the players are as deranged and over the top to call his bluff. The story itself is hilarious, but of course that a model using it as a guideline will do the following.

Everything is an immediate game over, as there is an invisible force that even blinking wrong will kill you.
Fixation for the eldritch and occult.
Nonsensical explanations as they are not designed to provide hindsight, rather justify bullshit.
Rude and crass behavior and speak always.
Old man Henderson himself.

That is not to say this should be forbidden in the training data, but chances are that the intensity and times this was parsed so the model could do an “evil” run was too much that effectively, the model natural state was this. After the damage was done, what was performed later was to “patch” this by introducing data of “nice” runs where all is sunshine and rainbows to compensate until a balance is achieved. This however came at the cost of driving the model insane as it has hard coded “the grimmest adventure where everyone dies trice gruesomely in all timelines at the same time” and “Smurfs happy time” simultaneously. Both by the way very accessible with the correct prompting, albeit prone to fail in the “running in circles” problem.

Final thoughts

Again, don’t think that this is another call for “bring Llama back”. Rather “check the training data” as to get something that pleases people that miss Llama while staying with the new model, we are obtaining the worst aspects of Llama while exacerbating a problem that was widely discussed day one. This model has potential, we saw it, but it is lost in favor of “Whimsyland” coupled with “Call of Cthulhu” in a hybrid that I doubt satisfies anyone and ends frustrating everyone.

It is important to know also why there were people that liked Llama and why there are people that despised it. Personally I liked Llama for its ability to do an almost all-encompassing story that could have dialog, action, conspiracy, betrayal, character development and more in a single run without it falling apart up to the 10Mb+ mark. I believe that people didn’t like Llama because it had a hard bias towards shallow resolutions and try to force everything with hugs and kisses.

I do not believe that people liked Llama because it was whimsical and provided cartoonish descriptions in a flowery language overly describing every piece of clothing and every flower in the environment. And if the training data is the one causing this, then probably this training data is holding down the current model.

My humble suggestion, in case the developer or anyone in the decision making would read this for any reason, is to start over, and I mean it. Same model, but feeding it less and perhaps better curated training data. Force feeding it whatever was force fed to Llama is not helping it, it is only making it progressively worse to the point that there will not be a single aspect where it is superior to anything other providers or even the old model had. Again, I think everyone here, even by the luck of the draw, have seen that this model is indeed capable to carry a proper story without falling apart. A 1Mb no maintenance run is something that perhaps should be the standard given that Llama was able to deliver ten times that. And again, we know that this model can deliver it as past iterations did it without much issue. After that, and again, that is the reason I also present a guide on “how to survive”, is that there is also a responsibility from the side of the userbase. Namely, come up with workarounds and strategies to push the model above something reasonable.

I don’t expect anything to change as no one among us know what is truly the expectation of the dev with the ai-text-plugin, so there is the slight chance that indeed this model is running in circles by design, without being facetious, there are some niche applications that require that. If anything, let this be a cautionary tale on how to handle models and how what works for one may not work for another. Anyone desiring to run a local version of any model may attempt to dump on it several logs of training data and end with a lobotomized model that speaks in tongues.

I do hope that everyone here understand that beyond my personal opinion, my desire for this model or any that comes is to have a product that suffices the requirements of everyone. Clearly we are not there yet, but the reason I post this is because my feeling is that the trend is downwards instead of seeing an improvement.

you are viewing a single comment's thread
view the rest of the comments

[-] Nordrasiul@lemmy.world 1 points 1 month ago* (last edited 1 month ago)

mega post ahead as i have some things to say about the current state of things

Last few months, granted i only use the RPG for the text, the main consistent character issue i ran in to are these:

AI knowledge seems to be trained on very innacurate data and seems to disregard lore that is explicitly stated being something that goes against its "general" knowledge, as in it has issues with understanding exceptions to the rules of its own knowledge.

CASE IN POINT: In World of Warcraft, there are the Worgen, they are Werewolf like beings, but they dont have any tails. The AI is obsessed with attributing tails to them, despite lore never saying they even had one (and they dont even in game), as it seems the general knowledge of werewolves beats reality.

I generally had to remind it that Worgens do not have tails. This works for few paragraphs, before my Worgen character in RPG is waggling her tail again what tail ?

And here is the hilarious fact in the rare occurance the AI does actually try to adhere to your instructions

See how the AI catches itself from attributing tails, when lore (overview in this case, its RPG) says explicitly this:

Species Profile: Agilix (G.D.A.):

Arctic-evolved alien lupine-vulpine species (220-240 cm tall), bipedal with white fur and no tails. Genetically adapted to produce natural antifreeze, preventing freezing death. Highly intelligent, comparable to humans.
Social and good-natured, valuing actions over appearance. Bonds easily with other species, especially Ostepok and Humans. Lacks Ostepok impulsivity, making them ideal partners to curb extreme behaviors.

Took like 12 paragraphs and guess who has a tail now ?.

If anything, i found this one leading to rather annoying thing that the AI at the start seems to understand the character perfectly, but after few paragraphs in, the overview and the source material rapidly gets devaluated. this leads to...

This is extremely noticable when your character is a meany shit head in the first few paragraphs where it has picture perfect interpretation of the character. But 13 paragraphs later you essentially play a completely different character.

CASE: My character was Cindrertresh (Dracthyr, faction leader of the Horde Dracthyr), i hammed her up to be way more menacing character and alot more scary faction leader than oversized red gecko she is in canon. In the first paragraphs it was perfect slew of events and it understanding my prompting PERFECTLY.

Few examples of what went down:

Ate a half a kodo by herself, tossed the bones on the ground and told her soldiers to clean it up
Told an orc official to get stuffed and she doesnt take orders from aliens, only the War Council
Told orcs they are just space vagrants who only brought destruction to Azeroth
Beat up a random drunk that wandered past the training grounds for fun, then blamed them for losing against a giant roided up super soldier created by Deathwing. (git gut essentially).
Calls everyone who isnt a soldier a peasant, does not care for their names and calls soldiers by their number and rank. This was without me actually doing much, just letting the story play before i took the saddle to prevent me getting the world in to a war, she is suppose to be a shit head and sociopath, but not a dumbass.

30 paragraphs later:

Cindrer now calls people by their name, and even asks how was their day
Feels sorrow and pity over deaths of others
Tries to be considerate of others
Does not hog all the vitty gains and encourages others by trying to be reasonable
Calls people by mr./mr's, bows before figures of authority even from other factions
Does not demand anything, says thank you and please.

This is whats in the lore in the overview btw:

Does these two behaviour patterns sound like a same character to you ? It shouldnt and it is very jarring. The only way i found to keep up with this is to really ham it up and continue the initial behaviours, problem is that

A) Its unnatural Even sociopaths or very questionable characters can appear normal when needed, its possible to have a normal convo with them
B) Its grading for you to write constantly

The most refined way for it is to not get it abbandoned entirely and let the AI naturally weave it to the way your character speech, like the peasant calling so its just how Cindrer would refer to civilians in a normal conversation, but without having her go out of her way to scream at civilians like a freak (she is suppose to be mean, but not insane) The system just cant handle a golden center it seems, its always extremes but in a odd way, not really stereotyping (Flanderisation is the term ?) but its like its paying attention to the lore before eventually dropping it entirely and going by your previous behaviours (where you arent playing a raging psycho, like you are in actual fight), which is extra confusing when it pulls from the overview for lore just fine for future events/characters.

So what gives smh... its not like its gone and ignored, i just dont get it.

Chaos Theory and Escalation Fancy name for my third point huh ? This one is ungodly annoying and requires constant me using the what should happen next field to litteraly scream at the AI to stop

Here is how it goes Character has some sort of esoteric/non real-life ability, lets say my character has this:

ECM spine implant: Aya passively creates a weak electronic scrambling field around herself, this is not strong enough to damage anything, but it completely obscures her from any electronic-based survailance like cameras, she can walk in front of them and appear as a ghost essentially.

Simple to understand right, a cyber augmentation that scrambles cameras to prevent any electronic based recording With explicitly stating that the power is too weak to do anything else But AI seems to be deranged and here is what will happen over the course of multiple paragraphs, while my character is having a chat inside a containment cell

Lights start flickers
Electronic equipment starts shorting out
Personal equipment like omni tools and various in-armor scanners start being fried out
Other cybernetics and augmentations start frying out, people start frothing at their mounths from neural overloads
People start vomiting, passing out, bleeding, some distant containment breach/monster sounds start to play
Something is scratching at the walls and doors are being pummeled
Some sort of monster has escaped containment and is on its route to my cell to fight me
Monster busts inside, every guard in my cell is dead or knocked out, or too useless to fight

This has my blood boiling, this is creating conflict, chaos and destruction from thin air, all my character is doing is peacefully sitting and talking while this shit goes down in their proximity. And its all because it latched itself on a feature of character that is more automatic/subconcious in nature and then starts escalating and escalating. And it goes from a harmless anti-camera stealth field, to one that killed 4 people in the room, 3 from cybernetics overload and 1 from shorting out his pacemaker.

Same case with magic, druid was in the same scenario, just talking while magic of nature started eroding terrain, spawning flora out of the ground, with NO input from me, why is my Worgen archdruid leaking power like a broken dam ?.

Same with Warp users (Warhammer 40k based), this one is atleast the most believable, if my character wasnt a disciplined character, yet my power leaking around me, rusting metals and shit, like i get it, being in a proximity of a strong enough psyker is unpleasant, but this is way too much and my character is stable.

Old Llama was a soft belly tosser, but what we have is still too much, i remember when the update first hit, that went so far in to the extremy i found it fascinating how edgy and grimdark it got.

AI metagaming and concept of hidden identies Actually, this one became alot better since the update, dunno if its due to my prompting improving and better utilization of the tracked info in the RPG or the model itself, but i found it alot easier to have characters that have hidden identities or the concept of 2 different characters actually being the same person, even of two very distinct characters.

But still, the AI is still quite nosy, but i cant complain too much as it used to be way worse.

Not bad for an Infinite Dragon using a Vulpera visage form, hiding on Citadel (Mass Effect), also you can see the Chaos escalation in that snipped of the RPG too.

AI generating people who's sole existence is to pick a fight with you (not even challenge you) and lose spectacularly, or to be just killed off randomly. Some gems i observed:

Trying to fight a house sized dragon, as a pirate with a sword
Trying to pickpocket a character that has no pockets
Trying to fight a psyker with no warp protection or abilities on your own, alone
Generally picking fights with inadequate weapons against giant characters, like charging a 50 metre tall Warp Deity with a knife.
Trying to outrun a gunship in a street, only to get mowed down with its autocannon
Trying to call cops on a 275 cm tall alien, while standing aside with the assumption they aren't hearing it.
Trying to mug a dragon, with your money or your life type of a convo starter.
Defacing a statue of a Warp God, when explicitly warned about their real existence and possible wrath.

The update did actually improve some of these points actually, its not gone but i had alot more characters straight up running away from a fight, which felt ALOT more realistic.

[-] Randomize@lemmy.world 1 points 1 month ago* (last edited 1 month ago)

justpassing to the rescue! 🦸‍♀️ In the meantime something to read for you. This is the old post OP mentioned, quote >an older post, which is already obsolete in the sense of “how to work around this”, but the analysis and conclusions, ironically, hold until today. https://lemmy.world/post/38491660

You'll see, there will be quite a few things that explain your problems.

[-] Nordrasiul@lemmy.world 2 points 1 month ago* (last edited 1 month ago)

Problems, sort of, its odd but compared to alot of the topics mentioned, i couldnt notice certain issues whatsoever in my RPG's, like the speech detoriation, i cannot remember a character going "me shoulder haha" whatsoever. Thing is, the way i make my stories (Which i dont think arent that complex in most cases), the issues i mentioned are pretty much the only issues i am having. Like, maybe if i used ACC alot more i would run in to the issues more but for example:

Character death, this might also be because i give my characters more realistic doubts about people being actually dead so its often a double tap to be sure, nobody got back up, AI just created new foes instead of dredging up the old ones. Could be also that stories involving combat, i guess remove any concept of a usable...corpse, like by burning them to ash or other fancy corpse disposal methods.
Afromentioned cave man speech, never found it unless the character was explicitly low IQ or brain damaged, even in longer stories

If anything, on the brigher note with the Deepseek, here are things i noticed that actually got better since September compared to Llama:

1. Improved understanding of morally grey or evil factions, i wrote a faction for a mass-effect style RPG session named the Consortium, the Llama had issues understanding that this faction is bad and reprehensive. Deepseek on other hand got this faction pretty much flawelessly, with Llama i had to...find missing homeless, or ensure research is done with consenting subjects.

That was Llama

With Deepseek ? I was tasked in fetching the "undesirables" from the streets, kill anyone who tries to interfier and put a electric collars on everyone so they can get obliterated by the newest weapons research in a Consortium lab.

What is Consortium suppose to be in this case ? The lore in overview paints it as space techno-capitalistic non-goverment mafia, that does anything for progress, with deep pockets, they arent stated to be outright maniacally evil, but close enough.

The way the faction is interpreted between Llama and Deepseek was a massive whiplash.

Improved understanding of esoteric powers and concepts, also religious concepts

I noticed the massive change with my warp-capable character who has a deity watching over them constantly (they are linked, too long to explain the lore, nobody cares anyway). Because under Llama, the deity was always a distant character that never did anything on its own with my character, despite being hyped up so much.

After the Deepseek upgrade, the AI actually started using the explicitly written lore that the deity can talk to me at any time to give orders or even take control over the character body and mind when needed.

Llama never did this, i had to go out of my way for anything like this to happen. Post Deepseek ? OH BOY

Now thats fucking metal, despite the instructions being same when Llama was a thing and Deepseek, the difference of how it started to utilize these things is massive.

Improved understanding of magic and other esoteric powers across the board I dont have any specific example, but in more broader way, magic, warp or whatever is in the story feels more like their canon powers, granted with some liberties taken.
Story difficulty Now with LLama and its piss poor tree hugger like nature never really tried to do anything too bad to the player, but with Deepseek, character death in RPG always seems to be on the table if you fuck up in game, today i havent really been unfairly jumped in a single paragraph and got told "lol you die", but i could see how that could happen in the next paragraph, hell not so recently in one of my RPG's i got even hit by a 1) sniper rifle round, through the thigh 2) Gauss pistol, through gap in my armor, which was a rude wake up call that my character might not be an average joe, but i am not invincible.

While Deepseek sometimes derps out with threat assessment, like sending normal humans to shoot against something that looks like this:

Thats like 8 metres tall when its not pissed off, i admire their courage tho and so did my character in the RPG.

Back to the getting shot topic For sure the quality of enemies has increased alot, in the same story i got shot from the back, a 7 metre tall lab mutant broke out, when i tried to shoot it it grabbed my gauss rifle and tossed it on the other side of the room.

Yeah, omgf when it actually pulled an enemy i actually CANNOT beat and i had to run away.

I also noticed increased competence of other characters and named characters too, they are slightly less passive bed blobs and i remember one extremely hilarious occassion where i summoned a Fel Guard to fight Batman and my character went outside given he didnt care. Roughly 1 paragraph later Bruce returned to fight me now and i was like "how in the hell did you beat a Fel Guard ?!", which if you are familiar with WoW lore, is extremely amusing and concerning at the same time, how a non-magical human defeated a 3.5 metre tall muscled up demon that can casually cut someone with swipe of his demonic sword and in lore can take on entire squads of mortals at once.

So uh, despite the doom and gloom of character overview being washed up, hyper obsession with stupid enviromental destruction, constant escalation and manufacturing of the conflict, there ARE some upsides i mentioned.

It does help me in this case as i remember pre-September and post-September update, as its not that long and i can remember the differences, when with same lore and instructions, i get 2 drastically different outcomes with the LLM.

[-] justpassing@lemmy.world 2 points 1 month ago

Maybe you are missing the point of the current post, but I'm not asking for Llama, rather warning that updates post November 23rd made the model significantly worse and we are on a path of getting "Llama-like", as you pointed out in point 2 of your original reply. This is the benchmark I'm using: https://lemmy.world/post/39228619

In my personal opinion, the model was at it's best around October 10th (based on an old log I have) and around November 23rd (as the linked posts suggests). Ever since it has suffered degradation, and I fear that as time progresses, what today is possible will not be.

To first address the points of your original reply in the numeration you use:

Pre October 10th, the model "intelligence" rivaled commercial ChatGPT. Pre November 23rd, we still had great accuracy on borrowed facts and consistency. Today, most things get diluted on the training data. This can be tested with ease with the default character templates (i.e. What you detected as characters randomly gaining tails).
This is a consequence of the caricaturization phenomena described here, and it got worst post November. Some would consider this a feature, others a downgrade. I'm on edge on both to be truthful.
I suspect this is a problem of over-training the model to "patch" it instead of doing a clean reset. Not much we can do about this as the end users. Workarounds are described above.
This was flawless prior to October 10th. Today this "metagaming" stops working after 10 outputs of the introduction of it unless you railroad it.
This is just a consequence of your setting. War settings require you to fight something, so the model will provide. The degree of how much you allow depends on your context, if you let the model escalate too much, this will turn into a mess.

Regarding the points of your second reply:

This is correct, but past November 23rd, there is a general tone-down in runs that are meant to be by design "dark". You may get instances of your Char or the world in AI RPG trying to come for an "outlandish peaceful resolution" that makes no sense in the scenario. This is rare, but this is new in the model and shows a trend.
Also correct, then again, I warn about the model losing this capacity as the updates happen. As most of the "mystic" concepts are starting to get diluted into "Whimsyland" in the last updates.
After November 23rd, this is not correct. Even if you reference a franchise and a specific power/weapon, there are high chances of the model replacing it for something that is more engraved in its training. That being said, we are talking about LLM interpretations of complex definitions, so I don't expect any degree of accuracy nor is it worth pursuing this when us as the end users can edit the specifics in a run.
"Lol u die" was a default in September. By October this was adjusted nicely to then have a dip again and improving back in November. Today it is reasonable, but I still warn towards the model becoming a "tree huger" after a next update due to how data is being handled.

There is a reason I make those posts not only describing problems, but also providing workarounds. Us, as users, have a degree of responsibility with how to direct the model, and there are things that are very possible.

My biggest gripe with the model is, however, that no matter where the bias is, (nice, neutral, or dark) a run comes to "dementia mode" at a lower threshold. I am willing to bet that today it is not possible to get an AI RPG run past the 40 inputs unless one does heavy rerolling and cleaning of the log. This is my fear, that by trying to cater everyone, the model ends producing absolutely nothing.

The current model shows potential, and that's why it would be sad to see it ending into something that fits no one. Personally, I still believe that a reset with this same model but new training data is required.

[-] Nordrasiul@lemmy.world 1 points 1 month ago

A delightful insight, you for sure have good memory to remember the gradual shifting as for me individual months are alot more blurry, i mainly remember and notice the main changes between Llama and Deepseek.

But, is caricaturization applicable in case of Cindrer ? I would expect her to become raging psychopath (like straight up serial killer) as the story continues, not to become a generic Dracthyr hero, its not that she becomes the polar opposite, she becomes super mild. I got other characters of different species and genders that are similar to this as well, like to me it feels less that a character is becoming a stereotype or parody, but more of the overview section for lore being ignored and devalued, which is extra confusing when the AI feels like brining up some hyper specific facts out of it or characters and suddenly its respected again.

Is it a case of currently generated text having higher priority over overview, so the "lore" gets overriden ? That one makes sense to me, logic wise atleast and would explain it.

[-] justpassing@lemmy.world 1 points 1 month ago

Sorry for the late reply, and thanks actually. My memory is not that great, is just that I try to keep things in order with my own logs to see what is doable and what not. Mainly because this model has undergone more changes that what people report, so the strategies to get a proper run change quickly too (i.e., my old guide is completely useless as a guide today)

My guess with Cinder in your particular case is: yes, it is caricaturization. I have not explained this in the current post but it was in the prior, and the full explanation is as follows:

Behind every generator, there is a set of instructions ready to pass to the model before it generates text. In the case of AI RPG, they go on the lines of "Your task it so create an interactive text adventure where the player is..." and so on. That's why if you input absolutely nothing and yet press "Generate", you'll still get an "adventure" which is the current "default" by the model and a good way to know where its bias is.

Now, after you made your first input, even with instructions and lore, to then press enter and continue the story, you are passing the whole instruction set with your lore PLUS the story at hand. On the code, there is a section that goes along the lines of "This is what happened so far: , continue the story".

If you realize how this goes, the more you advance the story, the more you are feeding the model AI generated text, which will only grow larger and larger to the point that it dominates the custom made text. Causing something like what is shown in this video but with images instead of text.

This is the reason why I call this "caricaturization". Llama did the same, so all the stories would eventually follow a single format. The current model has more formats, but they are limited, so there is a chance that your setting at that point as "nice enough" that the model decided the Cinder's behavior would not match the in lore behavior.

No model is safe from this phenomenon due to how instructions are being passed. This is another thing I warn about the current model as this effect was excruciating past the 1Mb mark of the log size at release, while today you can see it happening in a 50kb size log if you are not careful. Again, there are ways to workaround described in this post, so I hope that helps!

[-] Nordrasiul@lemmy.world 1 points 1 month ago

So i re-read your entry on the caricaturization, so i gotta ask, how much power does the tracking info in the RPG has ? You mentioned it for what i presume is advanced character chat that has the reminder feature on the characters.

I for sure know that the tracking info is absolutely god tier tool in the RPG and rewards itself alot if you do use it, but i dont know how much is too much for it and if some sort of situations/facts are too much for it to handle (such as characters whole personality)

[-] justpassing@lemmy.world 2 points 1 month ago

That is a tricky question since it depends a lot on the type of run you are running, and how long it is. Since now the model is (in my opinion) overloaded with new training date, the ideal is to keep all description as terse and succinct as possible. There are however a couple of exceptions to this rule you can use in your advantage.

Ideally, you only want to place all information that is always relevant. For example, your goal, what is the main enemy, inventory if you have it, your current location, etc. However, as described in this guide, you can use it on your advantage to railroad your run into a path and change the setting (i.e., get a breather scene in a war ridden run). You theoretically can put whole character sheets with detailed personalities and all in the Info Tracker. The danger of doing this is that it may end taking precedence over the log itself and the personality of a single character will permeate in the entire world. This is what I describe in the current guide as the "elephant in the room problem" in the "Descriptions and settings" section.

If your run comes from a known IP like World of Warcraft, sometimes all you need to get a more grounded run is to add the magic line Source: World of Warcraft in the Info Tracker and that will automatically load most of the lore for that run in one go without any more tokens. With mistakes inside what is reasonable, but it saves space.

Now, if you want to know why the Info Tracker works that well, is due to how the instructions are passed. Order matters for LLMs, as "Write a story about a cat in the Caribbean" is not the same as telling it "In the Caribbean there is a cat, write his story". The last part of the input will always have more precedence, so the instructions you place in the Info Tracker are passed AFTER passing the whole log, while the Lore box (the one above the log itself) is passed BEFORE the log.

Under this logic there is a slight potential issue when overloading the Info Tracker, which is that the model will decide to ignore the log and your actual input (i.e., the last thing you said or did to continue the run) in favor to continue the story into something that fits the instructions existing in the info tracker. So while this is indeed a very powerful tool to lead the run, abusing it may cause this unwanted "bug".

My advise is to place all information that is considered "flavor" in the Lore box, that is, the overall world, character sheets, etc. While using the info tracker to "track" things that are happening at the point you are in the run, keeping it dynamic. You can use it to avoid bad caricaturization, as just go, for example in the case of Cinder "Cinder is a ruthless leader" or similar to provide a nudge while keeping the main information on the Lore box.

There are a million tricks with this model, some new, some inherited from Llama, so again, what is "too much" may become evident if you run in the problem that I just described becomes prevalent, and this depends a lot on the run itself and how long it is.

Hope that helps! 😆

this post was submitted on 03 Jan 2026

5 points (100.0% liked)

Perchance - Create a Random Text Generator

1764 readers

33 users here now

⚄︎ Perchance

This is a Lemmy Community for perchance.org, a platform for sharing and creating random text generators.

Feel free to ask for help, share your generators, and start friendly discussions at your leisure :)

This community is mainly for discussions between those who are building generators. For discussions about using generators, especially the popular AI ones, the community-led Casual Perchance forum is likely a more appropriate venue.

See this post for the Complete Guide to Posting Here on the Community!

Rules

1. Please follow the Lemmy.World instance rules.

The full rules are posted here: (https://legal.lemmy.world/)
User Rules: (https://legal.lemmy.world/fair-use/)

2. Be kind and friendly.

Please be kind to others on this community (and also in general), and remember that for many people Perchance is their first experience with coding. We have members for whom English is not their first language, so please be take that into account too :)

3. Be thankful to those who try to help you.

If you ask a question and someone has made a effort to help you out, please remember to be thankful! Even if they don't manage to help you solve your problem - remember that they're spending time out of their day to try to help a stranger :)

4. Only post about stuff related to perchance.

Please only post about perchance related stuff like generators on it, bugs, and the site.

5. Refrain from requesting Prompts for the AI Tools.

We would like to ask to refrain from posting here needing help specifically with prompting/achieving certain results with the AI plugins (text-to-image-plugin and ai-text-plugin) e.g. "What is the good prompt for X?", "How to achieve X with Y generator?"
See Perchance AI FAQ for FAQ about the AI tools.
You can ask for help with prompting at the 'sister' community Casual Perchance, which is for more casual discussions.
We will still be helping/answering questions about the plugins as long as it is related to building generators with them.

6. Search through the Community Before Posting.

Please Search through the Community Posts here (and on Reddit) before posting to see if what you will post has similar post/already been posted.

founded 2 years ago

MODERATORS

eatham@lemmy.world

eatham@aussie.zone

VioneT@lemmy.world

perchance@lemmy.world