202
submitted 1 month ago by [email protected] to c/[email protected]
you are viewing a single comment's thread
view the rest of the comments
[-] [email protected] 24 points 1 month ago* (last edited 1 month ago)

There was a comment yesterday that offered a simpler explanation than the headline’s conclusion.

The papers were published by Iranian researchers and in Farsi “scanning” (روبشی) and “vegetative” (رويشی) differ only by one character (ب and یـ) which also happen to be adjacent on the keyboard.

That is, there’s some evidence that this is a typo or mistranslation that has been reused among non-native speakers, as opposed to a hallucination. If so, it could still be a LM replicating the error, but I’ve definitely seen humans do the exact same thing, especially when there’s a strong language barrier.

Edit: brevity

[-] [email protected] 5 points 1 month ago

A couple of decades ago I got really confused because I found a lot of papers referring to "comer" cubes, but could not find an actual definition. Eventually I figured out that these were actually "corner" cubes, but somewhere a transcription error occurred that merged the r and n into an m, and this error kept getting propagated because people were just copying and pasting.

[-] [email protected] 3 points 1 month ago

That’s an apt example from English, especially given the visual similarity of the error.

It’s the kind of error we would expect AI to be especially resilient against, since the phrase “corner cube” probably appears many times in the training dataset.

Likewise scanning electron microscopes are common instruments in many schools and commercial labs, so an AI writing tool is likely to infer a correction needed given the close similarity.

Transcription errors by human authors, however, have been dutifully copied into future works since we began writing stuff down.

[-] [email protected] 3 points 1 month ago

Yes. Between that and some bad OCR not recognizing text in columns, causing it to see these words in separate columns as a single phrase, it makes sense that it would be replicated in machine translations.

this post was submitted on 20 Apr 2025
202 points (98.1% liked)

science

19183 readers
251 users here now

A community to post scientific articles, news, and civil discussion.

rule #1: be kind

founded 2 years ago
MODERATORS