Hey everybody. I'm just getting into LLMs. Total noob. I started using llama-server's web interface, but I'm experimenting with a frontend called SillyTavern. It looks much more powerful, but there's still a lot I don't understand about it, and some design choices I found confusing.
I'm trying the Harbinger-24B model to act as a D&D-style DM, and to run one party character while I control another. I tried several general purpose models too, but I felt the Harbinger purpose-built adventure model was noticeably superior for this.
I'll write a little about my experience with it, and then some thoughts about LLMs and D&D. (Or D&D-ish. I'm not fussy about the exact thing, I just want that flavour of experience).
General Experience
I've run two scenarios. My first try was a 4/10 for my personal satisfaction, and the 2nd was 8/10. I made no changes to the prompts or anything between, so that's all due to the story the model settled into. I'm trying not to give the model any story details, so it makes everything up, and I won't know about it in advance. The first story the model invented was so-so. The second was surprisingly fun. It had historical intrigue, a tie-in to a dark family secret from ancestors of the AI-controlled char, and the dungeon-diving mattered to the overarching story. Solid marks.
My suggestion for others trying this is, if you don't get a story you like out of the model, try a few more times. You might land something much better.
The Good
Harbinger provided a nice mixture of combat and non-combat. I enjoy combat, but I also like solving mysteries and advancing the plot by talking to NPCs or finding a book in the town library, as long as it feels meaningful.
It writes fairly nice descriptions of areas you encounter, and thoughts for the AI-run character.
It seems to know D&D spells and abilities. It lets you use them in creative but very reasonable ways you could do in a pen and paper game, but can't do in a standard CRPG engine. It might let you get away with too much, so you have to keep yourself honest.
The Bad
You may have to try multiple times until the RNG gives you a nice story. You could also inject a story in the base prompt, but I want the LLM to act as a DM for me, where I'm going in completely blind. Also, in my first 4/10 game, the LLM forced really bad "main character syndrome" on me. The whole thing was about me, me, me, I'm special! I found that off putting, but the 2nd 8/10 attempt wasn't like that at all.
As an LLM, it's loosy-goosy about things like inventory, spells, rules, and character progression.
I had a difficult time giving the model OOC instructions. OOC tended to be "heard" by other characters.
Thoughts about fantasy-adventure RP and LLMs
I feel like the LLM is very good at providing descriptions, situations, and locations. It's also very good at understanding how you're trying to be creative with abilities and items, and it lets you solve problems in creative ways. It's more satisfying than a normal CRPG engine in this way.
As an LLM though, it let you steer things in ways you shouldn't be able to in an RPG with fixed rules. Like disallowing a spell you don't know, or remembering how many feet of rope you're carrying. I enjoy the character leveling and crunchy stats part of pen-and-paper or CRPGs, and I haven't found a good way to get the LLM to do that without just handling everything manually and whacking it into the context.
That leads me to think that using an LLM for creativity inside a non-LLM framework to enforce rules, stats, spells, inventory, and abilities might be phenomenal. Maybe AI-dungeon does that? Never tried, and anyway I want local. A hybrid system like that might be scriptable somehow, but I'm too much of a noob to know.
Thanks for your comments and thoughts! I appreciate hearing from more experienced people.
Yah, probably so. I tried to write a system prompt to steer the model toward what I wanted, but it's going to take a lot more refinement and experimenting to dial it in. I like your idea of asking it to be unforgiving about rules. I hadn't put anything like that in.
That's a great idea about putting a D&D manual, or at least the important parts, into a RAG system. I haven't tried RAG yet but it's on my queue of matters to learn. I know what it is, I just haven't tried it yet.
I've for sure seen that the quality of output starts to decline about 16K context, even on models that claim to support 128K. Also, I feel like the system prompt seems more effective when there are only let's say 4K context tokens so far. After the context grows, the model becomes less and less inclined to follow the system prompt. I've been guessing this is because as the context grows, any given piece of it becomes more dilute, but I don't really know.
For those reasons, I'm trying to use summarization to keep the context size under control, but I haven't found a good approach yet. SillyTavern has an auto summary injecting system, but either I'm misunderstanding it, or I don't like how it works, and I end up doing it manually.
I tried a few CoT models, but not since I moved to ST as a front end. I was using them with the standard llama-server web interface, which is a rather simple affair. My problem was that the thinking output seemed to spam up the context, leaving me much less ctx space for my own use. Each think block was like 500-800 tokens. It looks like ST might have an ability to only keep the most recent think block in the context, so I need to do more experimenting. The other problem I had was that the thinking could just take a lot of time.