this post was submitted on 06 May 2025
84 points (100.0% liked)

technology

23740 readers
103 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 2 points 2 days ago

My experience is that with ollama and deepseek r1 it reprocess the think tags. they get referenced directly.

This does happen (and i fucked with weird prompts for deepseek a lot, with very weird results) and I think it does cause what you described but like... the COT would get reprocessed in models without think tags too just by normal CoT prompting, and I also would just straight up get other command tokens outputted in even on really short prompts with minimal CoT. So I kind of attributed it to issues with local deepseek being as small as it is. I can't find the paper but naive CoT prompting works best with models that are already of a sufficient size, but the errors do compound on smaller models with less generalization. Maybe something you could try would be parsing the think tags to remove the CoT before re-injection? I was contemplating doing this but I would have to set ollama up again.

Its tough to say. I think an ideal experiment in my mind would be to measure hallucination rate in a baseline model, a baseline model with CoT prompting, and the same baseline model tuned by RL to do CoT without prompting. I would also want to measure hallucination rate with conversation length separately for all of those models. And I would also want to measure hallucination rate with/without CoT reinjection into chat history for the tuned CoT model. And also measuring hallucination rate across task domains with task-specific finetuning...

Not only that it hallucinated the characters back story that's not even in the post to give them a genetic developmental disorder

yikes