this post was submitted on 03 Jun 2024
1474 points (97.9% liked)
People Twitter
5197 readers
944 users here now
People tweeting stuff. We allow tweets from anyone.
RULES:
- Mark NSFW content.
- No doxxing people.
- Must be a tweet or similar
- No bullying or international politcs
- Be excellent to each other.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
That's precisely what I was thinking, but reflecting more on it, I don't know how well it would handle the webpages, so maybe some other languages mixed in too (I'm out of date, maybe PHP?). If AI writing code worked it would lower the barrier, but I'm not certain we're quite there yet to trust anything it would create.
Python web scraping is just fine, with the llms you.have the option of either extracting the html and having the LLM read.over that, or having a vision ai OCR the page and make its own decision of what to extract.