this post was submitted on 11 Jun 2024
94 points (100.0% liked)

technology

23303 readers
17 users here now

On the road to fully automated luxury gay space communism.

Spreading Linux propaganda since 2020

Rules:

founded 4 years ago
MODERATORS
 

The big AI models are running out of training data (and it turns out most of the training data was produced by fools and the intentionally obtuse), so this might mark the end of rapid model advancement

you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 9 points 5 months ago* (last edited 5 months ago)

Synthetic data is basically a fancy way of saying 'I'm properly formatting data and reinforcing the ai's good outputs'. Rearranging words, fixing / adding tags, that sort of thing. This is generated with various tools that usually have an LLM or VLM plugged in, though some are as simple as a regex script.