First of all, belated Merry Christmas and a Happy New Year to everyone. I hope everyone had great holidays and may this year be fruitful for everyone.
Formalities aside, as the title implies, we are at a point where the model is showing the worse of two worlds. That doesn’t mean that development has stopped. If anything, there is a handful of things we should praise the dev as the following things have been corrected and if they happen, they are just outliers.
- English in general and dialogues no longer resort to degradation/caveman speak.
- Bias is no longer to a single type of personality or story.
- Summaries are comprehensible and untainted.
- Manual railroading (i.e. unsticking the story) is easier.
That being said, the obvious problem which has plagued us since release is still there and it’s getting worse by the day: the model can latch into anything, create a pattern, and regurgitate it on a nonsense word-salad refusing to continue the story. But as last time, I’ll try to explain both how to work around this and give some thoughts to anyone interested. This is pretty much a continuation to an older post, which is already obsolete in the sense of “how to work around this”, but the analysis and conclusions, ironically, hold until today.
This time however, I would like to address the userbase first on the following, since despite the contents of this post and the previous, I understand the dev position on this and how much scrutiny he may get on different platforms. So the pressure to provide a quick fix to a menial issue may open the gate for greater problems, and that’s something I’ve not seen addressed anywhere.
Things no LLM can do accurately
In summary, due to how LLMs and other neural models are created, the following things will never be accurate.
- Basic logic (i.e. proper solution to a logic puzzle or recalling positions, matching, order, etc.)
- Spatial awareness (i.e. how things are positioned not only in a map, but also who carries something or where something is stored)
- Math (i.e. operations that are not common, and even counting past a threshold)
- Filler words (“white knuckles” is a prime example of this, there are many more and even if one is swatted away, another will take its place).
As you may see, most of these are logical problem than even if you feed the model enough context, it will make mistakes. Again, this is due to how neural networks work, as they look for “matches” to the last input, and there is no guarantee that the logical answer is the one with more likelihood to appear due to the training data.
The same happens with the filler words. And not only them but also repeated constructions (more on that later), as this is a natural phenomenon in language. For example, in this post alone, one could find some bias towards me using certain phrases and constructions favoring them over others. That is not to say this is wrong, but all models will have a distinct writing style that will be identifiable with absolute ease despite the dev best efforts to hide it or make it dynamic.
Therefore, asking to “fix” things such as “why does the model not remember where am I standing” or “why does the model ‘sings off-key’ when singing” is not something worth “fixing”, as these things, while annoying, can be addressed by the used by editing or removing. Even left unchecked and ignored and there will be no lasting consequences.
There are bigger demons that do need to be addressed, and this time before explaining “why”, first I’ll go to how you, the user, can work around this and have a semi-pleasant experience until your patience runs out.
The problem
You may have ran into the following at least once.
User: *While working at McDonalds* The ice cream machine broke.
Char: *Ears perking at the mention of the ice cream machine* Again? *Turns to face User, slamming his hands on the table hard enough to leave prints* Tell me User, is this the third time this day? *Flails his arms* Though I suppose we can’t do anything about this anymore! *Eyes widen at the realization* Right now Wendy released a new Choco chips cone, while Pizza Hut reinvented the Apple Hut! Years of customer loyalty gone to the drain! *Gestures vaguely at the ice cream machine* Just… just look at this! The ice cream is crystalizing! Frost signatures decaying as we speak! But maybe, just maybe, we can use it to our favor. *His grin turned mischievous* We can use this as a feature! Make it that we present this as a new flavor! This is not ice cream anymore, this is culinary physics!
Granted, this is an exaggeration, but you may see several of the problems in this case, we’ll go one by one as usual.
The return of caricaturization and its context
In the last post this was something to watch out and fear, but now it is something that can be used and exploded if done correctly. Llama used to have a single story format and a single character to who which you would put “a silly hat” and pretend it is a new one while the mannerisms and overall personality was constant. This worked well because the driver was the story and said “character” was all encompassing enough. The new model in the current state has a “cast” of characters, some of them who can only exist in certain contexts.
Without going on much detail on the “bestiary” of these, you may have noticed that depending of the traits you give your character, you may get a set writing style for each. E.g.
Char: *Her grin widened impossibly* Ohhh User~ *Draping her arms around User shoulders like a scarf* But you know what would be fun?~
This may happen if you give your Char the “mischievous” or “playful” trait, and this one can exist for in nice contexts or where the world is awful, changing only the actions while the personality remains. This is not true for all possible characters, as one with the “timid” and “gentle” traits would not keep its personality if the world is awful.
Consider this an update on the prior “manic personality” problem. Prior, the model would “randomly” try to change the personality to fit the setting in what would it deem logical, now once a setting and personality is set, it will try to stay on that no matter what. Changes can still exist, but within what is “reasonable”. For example, let’s say you are stuck in a point with a sarcastic, passive-aggressive Char who would only complain about everything. In this situation, the world around you will reflect this giving logical reason for your Char to complain. If you really want to force a personality change, or a setting change, you need to account for both. You can’t have a happy-jumping all over the place Char in a depressing world, or better said, it will fall apart because the model won’t let it stick and it will morph into something unwanted.
This is the extent of “character development” you can have. Let’s say you start with a depressing character who you wish to eventually gain a spine. The way to achieve this would be to follow this path.
- Depressing char, depressing world.
- Depressing char, manually introduced easy task/work/chore.
- Timid clumsy char, working its way on the set task.
- Clumsy char, increasingly demanding task.
- Clumsy char succeeding by manual/artificial intervention, demanding yet rewarding world.
- Confident yet slightly clumsy char, rewarding world.
This would be a way to achieve a full setting transformation, and notice that the heavy lifting resides on you adding the things that manually change both the setting and the Char personality. If you let the model handle this on its own you may have it leading you into absurd and frustrating situations to then settle on a setting and never moving past it, latching on repeating patterns (more on that later).
“Let’s not get ahead of ourselves”
Ironically, while this annoying catchphrase of Llama has not returned, now for once it is your responsibility to stop the model on its tracks before it escalates things into lunacy. The “impossible stakes” problem it is still persistent even if it is not the default anymore, and yes the “deus ex machina” is still a problem so trying to solve things when you get a world-ending scenario only introduces problems.
Luckily, detecting this is very easy as like before, you can “cage” the scope of a problem with reminders, and even without them things will stay reasonable unless you let the model hallucinate new threats on top of existing ones. If the stakes are already high, it is still possible to deal with this, but it may turn annoying as the Char will be likely to reject your answer to the problem, and the model and Narrator will even discard a solution that Char proposed. Rerolling is the wisest approach here, as this is just a case of pure chance, but it can be frustrating at times.
Curiously, the opposite also may happen now, which was a Llama pet peeve, the “shallow resolution” issue. This pretty much means that the problem will magically solve itself entirely on its own just by pure will without intervention, or even in the background. Keeping a proper balance of these aspects can turn tricky and unrewarding, but it is what we got and with effort it can be solved manually.
Now, there are two instances of escalation you should avoid like the plague for your sanity.
The Marvel/DC “explanation” problem
Previously I warned that sci-fi driven stories would be impossible due to the “word salad” problem and the model obsession with vibration physics and quantum mechanics. Today it is possible, but not advisable at all.
Similar to the original example provided and the previous guide, the “resonance, crystallization, signature, harmonics, vibration, probabilistic, superposition” and similar causes the model to try generating an outlandish explanation for literally everything, effectively killing your Narrator and turning your Char into a parrot repeating things over and over without doing anything of substance.
If you really need a sci-fi or remotely technological setting, you can do it, but as soon as you see any of these words or an “explanation” of something, cut it, no replacement whatsoever. As the model is past the “caveman speech” phase, now cutting text with no replacement is a viable strategy to keep moving forward.
The Disney Fantasia problem
This is very similar to the last one, but instead of being a family of words to watch out, this is more a situational problem when dealing with magical or “whimsical” settings. What will happen this time is a “subplot” around whatever magical critter (often a rodent) or some inanimate object gaining sentience. This was something existing in Llama, mainly in the no-prompt version of AI RPG with the “Whimsyland” story, but now it can happen everywhere from the nothingness if your setting allows magic or similar. It goes like this.
- A character capable of magic materializes something like a cup of tea from thin air.
- Said cup starts doing things on its own, like moving or swirling.
- If this is a conversation, the cup will mirror the conversation (i.e. you and this character discussing math, the cup will start solving equations).
- The cup will invite other objects to do whatever it is doing, escalating the setting into Disney Fantasia.
Another case could be this.
Narrator: A mouse peeked out of the hole, looking at Char warily.
Char: Uh… User. This mouse just gave me a receipt?
Cue 5 outputs later
Narrator: The mouse set an office on the pizza box, putting a plaque with its name and wearing a hat made of a post-it. It started auditing the apartment finances with eerie precision.
In both cases, the solution to avoid this is to just eliminate the first mention of the creature or object in question when it does something out of the ordinary. While in theory it is cute for this to happen in the background, the practice is that the model will not stop referencing and escalating this, refusing to move forward this curiosity.
Be wary that this may happen in conjunction with the problem of things being “quantum”, introducing a whole mess that will be near impossible to clean up later.
Patterns
This is the crux of this entire post, and something that was warned in the past, yet, not only unsolved but turned worse, and while there is a more “technical” way to deal with this today, it is still an uphill battle.
As stated in the previous post, everything can weave a pattern, you as the user, your task is to watch out for anything that looks vaguely similar to the past five outputs. If you let a construction nest for long, it will take root, and while there are ways to unstick it (more on it later), ideally you don’t want them to plague you on a scene that is unresolved.
However, there is some preference of the model when generating an output, so you can outright reroll or edit one of these repeating constructions in dialogs:
- Tell me …
- Though I suppose… (or Though + similar)
- Maybe, just maybe…
- Should I…
- Ohhh …
- ? Try
- , always/never
- It’s/This is no , it/this is
And those are dialog exclusive, as narration exclusive go:
- with unnecessary force.
- <pulled, tugged, grabbed something> with surprising strength.
- resembling something dangerously close as .
- with renewed urgency.
And this is without getting on the short filler phrases such as “knuckles white”, “hum a tuneless melody”, “eyes gleam mischievously”, “grin impossibly wide”, “arms flailing”, or similar many others.
What is difficult here, is that on the void none of those constructions are “wrong” nor they can be eliminated with no consequence as Llama’s annoying catchphrases such as “we are in this together” without altering the context. And again, letting any of those or similar being repeated in a five output window is dangerous as it will lock you into a scene that at most will “escalate” in the sense of adding things for description, but never moving forward.
For this approach, the best is reroll until you get something “fresh” compared to the last outputs, or outright manually write. It is manageable, but this factor alone puts you at edge when dealing with the model turning every run into a “debug” mission.
Then again, the reason I placed the tittle as is, is not just to draw comparisons with the old model. There is a larger metagame that aid you deal with the current model that worked in Llama times. And along with several demons that returned (more on that later), the strategy to get the best of this model, as well as its expectancies, is akin to the past.
The metagame
I never did a proper guide on how to deal with Llama in the past, but for what I gather, it was a model that stuck for so long that there is probably a lot of documentation on how to get it going, so probably there are better sources, but today with this model, despite being (allegedly) DeepSeek, this works.
Descriptions and settings
Be terse, my suggestion on “long full-line descriptions” in the last guide is void and null today as the “caveman speak” and “word salad” problems are gone. Now it is advised to describe things in a minimalistic, almost one word kind of deal. For example.
Personality: Cold, calculating, no-nonsense, pragmatical.
Remember the “elephant in the room” problem. Whatever you put in any description WILL appear somewhere as soon as the model decides it is relevant of acknowledging. This is not to say that complex personalities are still off the table, but they will obey the principle of “caricaturization” described prior, so under the assumption that a Char won’t manifest all of their range of emotions in a single output, it is best to use what you strictly need and nothing else. Same goes for unnecessary detail in things such as clothing, because then the model will take it as an invitation to describe it in a flowery way and never move forward, again, murdering your Narrator which is on death watch since the run start.
Under this principle, there is little to no need to describe the characters you are using, as it is implied that you, the user, will input everything manually for this character. Whatever you place there will permeate in other characters and the whole setting, making it change the story direction in ways that you may not desire. Again, remember the “elephant in the room” problem.
Pacing
The new model still lacks the concept of pacing, and it may solve a scene either never or immediately due a “deus ex machina”. However, contrary to how it started, it is bounded by your character behavior and the world setting. Meaning that whatever story and goal that aligns with this setting may flow unless you run into a pitfall caused by a pattern or any of the problems stated prior.
This introduces the problem of “how fast things can be solved”. In Llama, scenes were often too slow for the taste of many, requiring up to 20 outputs to get something thoroughly done and solved. This new model is more delicate on that matter, as a scene not solved in about 5-10 outputs is very likely to drag forever until “pressing the big red button” (more on that later). Likewise, you may need to keep it busy for more than two outputs or the problem will be magically solved. Essentially, to keep a run fresh, it is necessary to be moving constantly, never resting on a scene.
Something that is prone to fail in this new model is “planning”, as if you have a scene to coordinate with your Char or other NPCs prior to deal with a problem. The reason for this is because the model will need to tell you everything that is wrong with whatever you come up with and explain all that is happening, essentially forcing to tackle all action involving scenes directly. Dialog mixed with action is a whole can of worms, worth not touching yet (more on this later).
Reminders as railroading
More than often the model will give you an illogical solution or react to something in a way that makes no sense. As stated in the beginning, no model is completely logical, so when dealing with layout traversing, object carrying, or things that require logical skills, it is better to have the reminder in the input. Sort of how AI RPG implements it. Granted, this will work per scene and should be deleted once the issue or scene at hand is concluded.
Product/Project development
This was a cardinal sin in Llama and it is back. You MUST NOT let your Char design a “product” or plan an event, activity, business or similar. What this will cause is to obsess your Char about this particular “thing” starting to add ideas and suggestions over it pretty much forcing the entire world to circle around the idea and never the execution. Even after the “product” is developed, the problem solved and all, your Char will keep referencing it and trying to push you to it, as well as the world around you.
The way this happens is insidious, and you may want to delete the progression as it happens. This is an example.
User: Let’s make a pizza.
Char: How about we put pepperoni on it?
User: Sure.
Char: And, could it also have mushrooms? Maybe bell peppers cut in shape of ?
Once this nests, even if you forcefully exit the scene, the whole world will circle around it. There are ways to get rid of this later as the “big red button” approach, but for the time being the best is to outright avoid this direction on a story.
The “big red button”
You may guess what this is hinting, and yes, if for some reason you REALLY want to keep going but you got to a point where your run is going in circles endlessly, unable to progress, with a static world, a Char with a manic personality and flowery incomprehensible descriptions of everything around you while non-sentient objects dance around you, there is a solution. “Kill” your User and Char.
What I mean by this is that you can forcefully add a “subplot” to take over the “main cast” and proceed from there in the existing world in a way you only deal with one vector of the problem. I.e. a faulty world, before giving it back to your main cast.
The way this works is simple, and it also worked in Llama. Create a character you’ll use with absolutely NO description, and make it interact with a newly made, also never described, NPC. The model alone will fill the gaps using the “broken world” as a reference, but since it will have nothing to reference this new pair, it will allow for change with increased ease effectively allowing you to cleanup before returning to whatever you considered the main plot.
Personally, this worked flawlessly to get moving runs that spiraled into nothing past the 500kb threshold in the current state of the model at the time of writing this guide. The only limitation for this method is actually your patience, as it comes a point where having to keep a track of all what was mentioned beforehand coupled with how far you must go for a run… it is just not worth it at all.
Why all this even happens in the first place after so long?
Before anyone complains, this is not another long post disguised as Llama propaganda. After having to deal with the current model for so long, I finally see what dev sees in it, and there is evidence of it working semi-flawlessly above everyone’s expectations as proven in this post before falling into one that gets into “dementia mode” regurgitating everything with no direction as early as the third input. Personally, that is evidence that indeed the current model which (allegedly) is DeepSeek, COULD provide an experience akin to what Llama provided, even if it was capped at 1Mb before being unstable.
It is almost shameful to admit, but even this model when it was ultra-aggressive, it could carry a coherent story, albeit being comparable to torture-porn, until the summaries caused it to enter “dementia mode” and pretty much force the model to run in circles. Today, without excessive care, it is possible to run endlessly in circles at the totally pitiful mark of 50kb, absolutely miniscule compared to a peak performance of 1Mb in this same model.
Again, the reason for the title, and something that I scratch my head trying to reason why happens, is that indeed we are stuck with a model that takes several aspects of things Llama did that where not likeable, on top of the problems this new model carries, creating a sort of hybrid that gives a decent head start, but falls apart in a minute. It’s true, with the guidelines I gave it is possible to keep it going endlessly, especially keeping the “big red button” approach as a last effort, at some point one starts asking why even use services such as AI RPG, AI Chat or ACC. In fact, in these three there is a degree of control while generators such as Story Generator get the worst end as they are a “Narrator only run” which perishes after the third input.
And this time, I have a reasonable explanation on all this phenomena. Originally, I wrongly accused Llama to have certain obsessions and latch on terms such as “whispers”, “fractures”, “kaleidoscopes”, “clutter” and so on. Turns out these are not Llama exclusive, nor are they present by default on any model. Yes, they are on the database of the model, but the reason they existed and plagued us in the past and why they plague us now with a new family of nouns, adjectives and pseudo catchphrases, is due to the fine-tuning training data. I.e. what the dev is feeding the model to do what it does now.
Evidence of this claim
Veterans of the times of Llama may recall that a no-prompt run in AI RPG would immediately take you to “Whimsyland” and variants. A run where the world was Charlie in the Chocolate Factory with anthropomorphic animals singing sunshine and rainbows were your objective was to get some mystical artifact for a festival. Likewise, a more rare case was a blank run in AI Chat where Bot and Anon where introduced as heroes of a fantasy world about to embark in a dungeon, again, to get some mystical artifact. Other generators with the default settings let you into a “default” run that circled a common theme, that ended being annoying as it will eventually “breach containment” and permeate in a custom run having “whimsy” elements where undesired and similar.
If we try today, at the time of writing this guide, you may obtain this from AI Chat.

As seen here, there are “whimsy” elements such as talking animals as in a bootleg version of Looney Tunes or over the top situations that escape slapstick comedy and enter the realm of surrealism for the sake of strange. This is a mirror of the “Marvel/DC explanation problem” and the “Disney Fantasia problem” as the model when prompted “write a story” or “write an adventure” will default to bring those elements due to the training.
I would like to remind you all, with this same model this was not always the case. When the model was new and ultra-violent, the default AI RPG run was “eldritch creatures plague middle earth” and “cyberpunk but the men in black will kill you”. While I don’t have a screenshot or log to validate this claim and the model will no longer do that, I do hope that someone detected this when the model was new.
This is important to know because it shows where the bias of the model is, so everything that is “default” for this model becomes terra non grata. For example, originally sci-fi runs were unbearable, today both sci-fi and magic oriented runs are unbearable unless you walk in eggshells.
And this brings me to the main point of this post considering the progression of the model consistency along time. It has become more focused compared to release, but the point of breaking has been reduced per update. In my old guide I promised 1Mb. Today with no counter-measurements, runs may die before 100kb. At the current rate, the next update will make 50kb a feat, even with the “big red button” strategy as a tool to keep going, it is extremely annoying to do a proper story that is not something that finishes entirely after 20 outputs. And it is indeed in my opinion each update which pushes the stability range lower and lower.
The “patching” approach and its problems
There is also a reason why first I addressed the general public on what things are unreasonable to ask for, and will be a problem no matter the model, no matter the refinement. It is my belief that a handful of updates in the model are a reply to the outrage of the community, such as “the model is too evil”, or “the model keeps forgetting this”, or even more common “what’s up with knuckles turning white?” Attempting to patch these things reduce the model capability by over-focusing it into whatever the new training data is forming new obsessions, that unavoidably end in the death spiral that is having the model running in circles.
This is not a DeepSeek exclusive problem, Gemini deals with this a lot since it is fed new training data using Google data farms which lead to take several social media posts as “normal human behavior” causing things shown here. While these make for fun memes, a similar effect is happening with the model used in Perchance as it is increasingly over-trained on the existing dataset to the point of being loopy.
This is also a very important aspect to consider, Llama 2 was a “dumber” model, so getting it react as the dev intended required a humongous effort and re-training. Modern models seem to be more brittle in the sense that a small nudge change their scope greatly, so the approach of re-train a pre-trained model over and over is leading to the “Llamification” of the current model while reducing its “intelligence”. I’m afraid to say this was also predicted in the previous guide and it would be the cost of hyper-training. And this is evident as even when the model was atrocious to the standards of many, it received praise on the grounds of “remembering better”, “being more coherent”, and “not mixing up description”. This is now lost, like how Llama struggled with these.
A small conspiracy theory
If we go under the presented assumption that all the problems we see today and prior with this model and even Llama, there is a reasonable explanation on why the model was horribly violent, dramatic, and over the top when it was introduced. I believe that the process to jailbreak any of these models in order to make it produce content they are not designed to do from the get-go (e.g. violent content, drug and sexual references, polemic topics and others) is to present it with cases of outputs for the inputs that demand these cases. This is not totally true as while a public version of DeepSeek may at first refuse you a graphical description of something hideous, the separate wording for it exist on its database, and this is why workarounds at frontend level exist.
To me, patient zero of the original madness was the training data that was used to jailbreaks this model, in particular the Old Man Henderson story. For those that don’t know, this is a Call of Cthulhu run where things are all over the top as a GM attempts to murder his players in gruesome and deranged ways, for him to fail horribly as the players are as deranged and over the top to call his bluff. The story itself is hilarious, but of course that a model using it as a guideline will do the following.
- Everything is an immediate game over, as there is an invisible force that even blinking wrong will kill you.
- Fixation for the eldritch and occult.
- Nonsensical explanations as they are not designed to provide hindsight, rather justify bullshit.
- Rude and crass behavior and speak always.
- Old man Henderson himself.
That is not to say this should be forbidden in the training data, but chances are that the intensity and times this was parsed so the model could do an “evil” run was too much that effectively, the model natural state was this. After the damage was done, what was performed later was to “patch” this by introducing data of “nice” runs where all is sunshine and rainbows to compensate until a balance is achieved. This however came at the cost of driving the model insane as it has hard coded “the grimmest adventure where everyone dies trice gruesomely in all timelines at the same time” and “Smurfs happy time” simultaneously. Both by the way very accessible with the correct prompting, albeit prone to fail in the “running in circles” problem.
Final thoughts
Again, don’t think that this is another call for “bring Llama back”. Rather “check the training data” as to get something that pleases people that miss Llama while staying with the new model, we are obtaining the worst aspects of Llama while exacerbating a problem that was widely discussed day one. This model has potential, we saw it, but it is lost in favor of “Whimsyland” coupled with “Call of Cthulhu” in a hybrid that I doubt satisfies anyone and ends frustrating everyone.
It is important to know also why there were people that liked Llama and why there are people that despised it. Personally I liked Llama for its ability to do an almost all-encompassing story that could have dialog, action, conspiracy, betrayal, character development and more in a single run without it falling apart up to the 10Mb+ mark. I believe that people didn’t like Llama because it had a hard bias towards shallow resolutions and try to force everything with hugs and kisses.
I do not believe that people liked Llama because it was whimsical and provided cartoonish descriptions in a flowery language overly describing every piece of clothing and every flower in the environment. And if the training data is the one causing this, then probably this training data is holding down the current model.
My humble suggestion, in case the developer or anyone in the decision making would read this for any reason, is to start over, and I mean it. Same model, but feeding it less and perhaps better curated training data. Force feeding it whatever was force fed to Llama is not helping it, it is only making it progressively worse to the point that there will not be a single aspect where it is superior to anything other providers or even the old model had. Again, I think everyone here, even by the luck of the draw, have seen that this model is indeed capable to carry a proper story without falling apart. A 1Mb no maintenance run is something that perhaps should be the standard given that Llama was able to deliver ten times that. And again, we know that this model can deliver it as past iterations did it without much issue. After that, and again, that is the reason I also present a guide on “how to survive”, is that there is also a responsibility from the side of the userbase. Namely, come up with workarounds and strategies to push the model above something reasonable.
I don’t expect anything to change as no one among us know what is truly the expectation of the dev with the ai-text-plugin, so there is the slight chance that indeed this model is running in circles by design, without being facetious, there are some niche applications that require that. If anything, let this be a cautionary tale on how to handle models and how what works for one may not work for another. Anyone desiring to run a local version of any model may attempt to dump on it several logs of training data and end with a lobotomized model that speaks in tongues.
I do hope that everyone here understand that beyond my personal opinion, my desire for this model or any that comes is to have a product that suffices the requirements of everyone. Clearly we are not there yet, but the reason I post this is because my feeling is that the trend is downwards instead of seeing an improvement.





You can, but it is not straightforward as you may think.
If you press the edit button, on the left hand side of the code, at Line 500 you may find something like this:
From there forward you see a handful of instructions in plain English that tell the model to generate a summary on the vein of: "Your task is to generate some text and then a 'SUMMARY' of that text, and then do that a few more times..." and so on.
Since this instruction is passed in English, the output will be in English as well. If you want to maintain everything in German, you must translate this instruction to German manually.
Now, you'd be surprised but the summaries may not be the culprit of your run being in English randomly, as this principle applies to the how normal instructions are passed, for example, in Line 7291 of the right hand side of the code, you'll find this:
And below several instructions in plain English that tell the model how to direct the story. This and several other instructions are passed always each time you press "Send" or the return key, so if you want to be completely sure that your text is never in English, you may need to translate all these instructions as well.
However, something that in the past worked (but I personally have not tested after so many updates this model had undergone so I can't assure it still works) is that in the Custom Roleplay Style box you can in English write as a prime instruction "The whole text, story, RP MUST be in German (or your desired language)" and it would work without need of translating all.
Granted, this will not change the language of the summaries as the instruction for this is done separately, but it may not affect the output that matters for you.
Hope that helps.