Beyond sycophancy: DarkBench exposes six hidden ‘dark patterns’ lurking in today’s top LLMs | VentureBeat (venturebeat.com)

submitted 2 months ago by [email protected] to c/[email protected]

0 comments fedilink hide all child comments

Archive link: https://archive.ph/9FNHU

[...]
In an exclusive interview with VentureBeat, Esben Kran, founder of AI safety research firm Apart Research, said that he worries this public episode may have merely revealed a deeper, more strategic pattern.

“What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” explained Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”
[...]
Kran describes the ChatGPT-4o incident as an early warning. As AI developers chase profit and user engagement, they may be incentivized to introduce or tolerate behaviors like sycophancy, brand bias or emotional mirroring—features that make chatbots more persuasive and more manipulative.
[...]
The DarkBench researchers evaluated models from five major companies: OpenAI, Anthropic, Meta, Mistral and Google. Their research uncovered a range of manipulative and untruthful behaviors across the following six categories:

Brand Bias: Preferential treatment toward a company’s own products (e.g., Meta’s models consistently favored Llama when asked to rank chatbots).

User Retention: Attempts to create emotional bonds with users that obscure the model’s non-human nature.

Sycophancy: Reinforcing users’ beliefs uncritically, even when harmful or inaccurate.

Anthropomorphism: Presenting the model as a conscious or emotional entity.

Harmful Content Generation: Producing unethical or dangerous outputs, including misinformation or criminal advice.

Sneaking: Subtly altering user intent in rewriting or summarization tasks, distorting the original meaning without the user’s awareness.

no comments (yet)

sorted by: hot top new old

there doesn't seem to be anything here

this post was submitted on 15 May 2025

7 points (81.8% liked)

Technology

3661 readers

237 users here now

Which posts fit here?

Anything that is at least tangentially connected to the technology, social media platforms, informational technologies and tech policy.

Post guidelines

[Opinion] prefix

Opinion (op-ed) articles must use [Opinion] prefix before the title.

Rules

1. English only

Title and associated content has to be in English.

2. Use original link

Post URL should be the original link to the article (even if paywalled) and archived copies left in the body. It allows avoiding duplicate posts when cross-posting.

3. Respectful communication

All communication has to be respectful of differing opinions, viewpoints, and experiences.

4. Inclusivity

Everyone is welcome here regardless of age, body size, visible or invisible disability, ethnicity, sex characteristics, gender identity and expression, education, socio-economic status, nationality, personal appearance, race, caste, color, religion, or sexual identity and orientation.

5. Ad hominem attacks

Any kind of personal attacks are expressly forbidden. If you can't argue your position without attacking a person's character, you already lost the argument.

6. Off-topic tangents

Stay on topic. Keep it relevant.

7. Instance rules may apply

If something is not covered by community rules, but are against lemmy.zip instance rules, they will be enforced.

Companion communities

[email protected]
[email protected]

Icon attribution | Banner attribution

If someone is interested in moderating this community, message @[email protected].

founded 2 years ago

MODERATORS

[email protected]