technology

23740 readers

103 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

1. Obviously abide by the sitewide code of conduct. Bigotry will be met with an immediate ban
2. This community is about technology. Offtopic is permitted as long as it is kept in the comment sections
3. Although this is not /c/libre, FOSS related posting is tolerated, and even welcome in the case of effort posts
4. We believe technology should be liberating. As such, avoid promoting proprietary and/or bourgeois technology
5. Explanatory posts to correct the potential mistakes a comrade made in a post of their own are allowed, as long as they remain respectful
6. No crypto (Bitcoin, NFT, etc.) speculation, unless it is purely informative and not too cringe
7. Absolutely no tech bro shit. If you have a good opinion of Silicon Valley billionaires please manifest yourself so we can ban you.

founded 4 years ago

MODERATORS

[email protected]

ChatGPT's hallucination problem is getting worse according to OpenAI's own tests and nobody understands why (www.pcgamer.com)

submitted 3 days ago by [email protected] to c/[email protected]

42 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 2 points 2 days ago

My experience is that with ollama and deepseek r1 it reprocess the think tags. they get referenced directly.

This does happen (and i fucked with weird prompts for deepseek a lot, with very weird results) and I think it does cause what you described but like... the COT would get reprocessed in models without think tags too just by normal CoT prompting, and I also would just straight up get other command tokens outputted in even on really short prompts with minimal CoT. So I kind of attributed it to issues with local deepseek being as small as it is. I can't find the paper but naive CoT prompting works best with models that are already of a sufficient size, but the errors do compound on smaller models with less generalization. Maybe something you could try would be parsing the think tags to remove the CoT before re-injection? I was contemplating doing this but I would have to set ollama up again.

Its tough to say. I think an ideal experiment in my mind would be to measure hallucination rate in a baseline model, a baseline model with CoT prompting, and the same baseline model tuned by RL to do CoT without prompting. I would also want to measure hallucination rate with conversation length separately for all of those models. And I would also want to measure hallucination rate with/without CoT reinjection into chat history for the tuned CoT model. And also measuring hallucination rate across task domains with task-specific finetuning...

Not only that it hallucinated the characters back story that's not even in the post to give them a genetic developmental disorder

yikes