this post was submitted on 29 Nov 2023
1 points (100.0% liked)
Data Hoarder
168 readers
1 users here now
We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Short answer: you're asking questions that will take a program requesting data (the whole internet archive?) non-stop for a month or more. You are gonna need to learn to code if you want to interact with that much data.
You're going to need to automate it. A rate-limiter is going to kick in very quickly if you are just spamming the API.
You need to learn for yourself if this is a project you are tackling. Also will need to familiarize yourself with the terms of service of the archive, because most services would consider scraping every piece of data they have as abusive behavior and/or malicious.