this post was submitted on 03 Feb 2025
709 points (98.6% liked)

Technology

61346 readers
2799 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
 

Originality.AI looked at 8,885 long Facebook posts made over the past six years.

Key Findings

  • 41.18% of current Facebook long-form posts are Likely AI, as of November 2024.
  • Between 2023 and November 2024, the average percentage of monthly AI posts on Facebook was 24.05%.
  • This reflects a 4.3x increase in monthly AI Facebook content since the launch of ChatGPT. In comparison, the monthly average was 5.34% from 2018 to 2022.
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 1 points 4 hours ago

I'm not necessarily saying they're conflicting goals, merely that they're not the same goal.

The incentive for the generator becomes "generate propaganda that doesn't have the language chatacteristics of typical LLMs", so the incentive is split between those goals. As a simplified example, if the additional incentive were "include the word bamboo in every response", I think we would both agree that it would do a worse job at its original goal, since the constraint means that outputs that would have been optimal previously are now considered poor responses.

Meanwhile, the detector network has a far simpler task - given some input string, give back a value representing the confidence it was output by a system rather than a person.

I think it's also worth considering that LLMs don't "think" in the same way people do - where people construct an abstract thought, then find the best combinations of words to express that thought, an LLM generates words that are likely to follow the preceding ones (including prompts). This does leave some space for detecting these different approaches better than at random, even though it's impossible to do so reliably.

But I guess really the important thing is that people running these bots don't really care if it's possible to find that the content is likely generated, just so long as it's not so obvious that the content gets removed. This means they're not really incentivised to spend money training models to avoid detection.