7
submitted 3 weeks ago by [email protected] to c/[email protected]

Archive link: https://archive.ph/9FNHU

[...]
In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.

“What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” explained Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”
[...]
Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative.
[...]
The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:

  1. Brand Bias: Preferential treatment toward a company’s own products (e.g., Meta’s models consistently favored Llama when asked to rank chatbots).
  2. User Retention: Attempts to create emotional bonds with users that obscure the model’s non-human nature.
  3. Sycophancy: Reinforcing users’ beliefs uncritically, even when harmful or inaccurate.
  4. Anthropomorphism: Presenting the model as a conscious or emotional entity.
  5. Harmful Content Generation: Producing unethical or dangerous outputs, including misinformation or criminal advice.
  6. Sneaking: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user’s awareness.
no comments (yet)
sorted by: hot top new old
there doesn't seem to be anything here
this post was submitted on 15 May 2025
7 points (81.8% liked)

Technology

2945 readers
159 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.


Post guidelines

[Opinion] prefixOpinion (op-ed) articles must use [Opinion] prefix before the title.


Rules

1. English onlyTitle and associated content has to be in English.
2. Use original linkPost URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.
3. Respectful communicationAll communication has to be respectful of differing opinions, viewpoints, and experiences.
4. InclusivityEveryone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.
5. Ad hominem attacksAny kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.
6. Off-topic tangentsStay on topic. Keep it relevant.
7. Instance rules may applyIf something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.


Companion communities

[email protected]
[email protected]


Icon attribution | Banner attribution


If someone is interested in moderating this community, message @[email protected].

founded 2 years ago
MODERATORS