1
3

I've found that all the web archiving software I've encountered are either manual (you have to archive everything individually in a separate application) or crawler-based (which can end up putting a lot of extra load on smaller web server, and could even get your ip blocked).

Are there any solutions that simply automatically archive web pages as you load them in your browser? If not, why aren't there?

I could also see something like that being useful as a self-hosted web indexer, where if you ever go "I think I've seen this name before", you can click on it, and your computer will say something like "this name appeared in a news headline you scrolled past two weeks ago"

OQB @kayzeekayzee@lemmy.blahaj.zone

2
10
3
8
CIA The World Factbook (1981-2026) (onlinebooks.library.upenn.edu)
4
15
5
1
6
0

IMDB: https://www.imdb.com/title/tt0108756/ Rating: 8.0/10 (8.1K) Rated: TV-PG The adventures of an impossibly upright Royal Canadian Mounted Police constable and his American colleagues in the city of Chicago.

7
2

Archive of 2160p, 1080p movies and shows.

8
2
Biggest Archive (a.111477.xyz)

Total Size: 1.0 PB / Total Files: 470,224 This is biggest open archive, it has a lot of HD content.

9
6

If you merge the three versions of DataSet 9 that are found so far:

DataSet%209.zip : https://github.com/yung-megafone/Epstein-Files Data Set 9.tar.xz : https://archive.org/details/data-set-9.tar.xz dataset9-more-complete.tar.zst : https://github.com/yung-megafone/Epstein-Files

You will end up with 531,282 IMAGES files (PDF). You would think that there is a lot missing, however, the partially corrupted DataSet%209.zip gives us a DAT and OPT file to see what files remain.

The DAT file reveals there are only 531,307 IMAGES files (PDF) supposed to be in the archive. Which means only 25 PDF files are actually missing.

You'd notice that 25 PDF files couldn't possibly be the remaining 80-ish GB that remains of the original DataSet 9, but the DAT file doesn't reveal how many NATIVES there were.

NATIVES are media files like videos and audio. You can see an example if you have a full DataSet 10. But from DataSet 10 it reveals to us that all NATIVES have a placeholder as a PDF which is always 4670 bytes.

So by searching all files that are that exact size, it reveals there are about 135 NATIVES (media files) that are missing, which would be the rest of the 80 GB that is missing.

I have listed below what IMAGES (PDF) and NATIVES (media) files are missing, such that it is easy to coordinate to track down the remaining files that we need for a complete DataSet 9.

(Though the remaining PDFs could be placeholder for up to 25 more natives, which would have to be checked when finding them).

MISSING_EFTA_IMAGES:
EFTA00709804,EFTA00709805,EFTA00709806,EFTA00709807,EFTA00770595,EFTA00774768,EFTA00823190,EFTA00823191,EFTA00823192,EFTA00823221,EFTA00823319,EFTA00877475,EFTA00892252,EFTA00901740,EFTA00912980,EFTA00919433,EFTA00919434,EFTA00932520,EFTA00932521,EFTA00932522,EFTA00932523,EFTA00984666,EFTA00984668,EFTA01135215,EFTA01135708

MISSING_EFTA_NATIVES:
EFTA00068376,EFTA00072394,EFTA00072395,EFTA00072396,EFTA00072397,EFTA00072398,EFTA00072399,EFTA00072400,EFTA00072401,EFTA00083881,EFTA00089243,EFTA00090492,EFTA00093515,EFTA00093697,EFTA00096469,EFTA00104842,EFTA00135578,EFTA00143411,EFTA00143735,EFTA00151167,EFTA00151168,EFTA00151169,EFTA00152684,EFTA00152685,EFTA00152686,EFTA00152687,EFTA00152688,EFTA00152689,EFTA00152690,EFTA00152691,EFTA00152692,EFTA00155484,EFTA00155485,EFTA00155486,EFTA00155488,EFTA00155489,EFTA00155490,EFTA00155551,EFTA00157542,EFTA00159164,EFTA00165150,EFTA00179442,EFTA00179443,EFTA00179444,EFTA00179445,EFTA00179446,EFTA00182656,EFTA00182657,EFTA00184097,EFTA00184098,EFTA00221035,EFTA00221036,EFTA00221037,EFTA00221038,EFTA00221039,EFTA00221040,EFTA00221041,EFTA00221042,EFTA00221043,EFTA00221044,EFTA00221045,EFTA00221046,EFTA00221047,EFTA00221048,EFTA00221049,EFTA00221050,EFTA00221051,EFTA00221052,EFTA00221053,EFTA00221054,EFTA00221055,EFTA00221056,EFTA00221058,EFTA00221059,EFTA00239786,EFTA00239787,EFTA00241270,EFTA00276490,EFTA00277088,EFTA00277091,EFTA00277094,EFTA00277095,EFTA00277096,EFTA00277098,EFTA00279451,EFTA00279453,EFTA00759424,EFTA00776196,EFTA01140431,EFTA01140602,EFTA01141209,EFTA01141213,EFTA01144362,EFTA01144363,EFTA01144697,EFTA01145825,EFTA01147043,EFTA01149290,EFTA01149291,EFTA01173979,EFTA01177273,EFTA01177560,EFTA01177632,EFTA01181146,EFTA01182315,EFTA01184143,EFTA01190710,EFTA01192998,EFTA01193063,EFTA01194887,EFTA01195505,EFTA01196058,EFTA01196418,EFTA01196421,EFTA01196518,EFTA01196747,EFTA01196752,EFTA01196754,EFTA01196756,EFTA01196936,EFTA01197105,EFTA01197126,EFTA01197787,EFTA01197931,EFTA01198064,EFTA01198505,EFTA01204371,EFTA01205883,EFTA01206089,EFTA01250813,EFTA01250814,EFTA01250815,EFTA01250886,EFTA01250917,EFTA01250922

OC write-up by @ermstein@lemmy.world

10
6

Soft 98 is an Iranian software distribution site, that has stood up after sanction had crippled the ability of the normal people and businesses in Iran from getting access to important software from the outside world.

As the Iranian government threatens to cut off from the world this rich archive of software is vulnerable to wiped from the internet. It is one of the most widely diverse software pool that's trusted I have ever seen.

Is there anyway to pool together resources to save the software's of this site, which to me is like The Software Library of Alexandria from permanent cyberspace loss.

11
5

Sorry if this is not the place to ask I also tried on a different instance as well

I bought an adapter to retrieve old files from ancient hard drives and I didn't save the stuff from one I had looked at. Now though when I plug it in it will only read as an android file system? It has 2 disk images now, one is labeled Presario D: which shows up as an android backup or something but all folders are empty. The other is Local Disk E: and if I click it it literally just locks up my file explorer to the point I have to restart the PC.

Any thoughts or ideas?

I may have plugged it into an android phone at some point? Not sure though.

OQB @WhyIHateTheInternet@lemmy.world

12
10

Epstein Files Jan 30, 2026

Data hoarders on reddit have been hard at work archiving the latest Epstein Files release from the U.S. Department of Justice. Below is a compilation of their work with download links.

Please seed all torrent files to distribute and preserve this data.

Ref: https://old.reddit.com/r/DataHoarder/comments/1qrk3qk/epstein_files_datasets_9_10_11_300_gb_lets_keep/


Epstein Files Data Set 9 (Incomplete). Only contains 49 GB of 180 GB. Multiple reports of cutoff from DOJ server at offset 48995762176.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 6ae129b76fddbba0776d4a5430e71494245b04c4

Unverified version incomplete at ~101 GB.


Epstein Files Data Set 10. Unzipped it's about 82 GB.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA256: 7D6935B1C63FF2F6BCABDD024EBC2A770F90C43B0D57B646FA7CBD4C0ABCF846 MD5: B8A72424AE812FD21D225195812B2502


Epstein Files Data Set 11. Unzipped it's about 27.5 GB.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 574950c0f86765e897268834ac6ef38b370cad2a


Epstein Files Data Set 12. Zipped it's about 114.1 MB.

ORIGINAL JUSTICE DEPARTMENT LINK

SHA1: 20f804ab55687c957fd249cd0d417d5fe7438281
MD5: b1206186332bb1af021e86d68468f9fe
SHA256: b5314b7efca98e25d8b35e4b7fac3ebb3ca2e6cfd0937aa2300ca8b71543bbe2


This list will be edited as more data becomes available, particularly with regard to Data Set 9.

Write up by @xodoh74984@lemmy.world

13
7
submitted 3 weeks ago* (last edited 3 weeks ago) by partner_boat_slug@mander.xyz to c/datahoarder@selfhosted.forum

I heared Avif format is more efficient modern alternative, consumes less storage space. How can I transcode my old videos to it, with acceptable quality? ffmpeg?

14
33
15
3

Mario Builder 64 is a level editor realized fully in Super Mario 64 itself, which should run on real hardware I think. It is intuitive to use and the community created ton of custom levels. I think a custom software is needed to handle the community stuff, but the Romhack itself is playable on an emulator if you want test building your own levels.

The download page for the patch file (remember its not a Rom, its just a patch) got hit by a DMCA. Usually Nintendo does not do that with Romhacks. Sure the patch files itself are not Rom files, but they might contain data that is copyrighted. That's why Nintendo might be annoyed by this.

Get your patch file copies (.bps format) and archive it if you care.

16
7
17
37
18
5
19
10

Our project to preserve the history of Sega Channel — including over 100 new Sega Channel ROMs.

By Phil Salvador

December 15, 2025

Sega broke ground in the late 90s with one of the first digital game distribution systems for consoles. Sega Channel offered access to a rotating library of Sega Genesis titles, along with game tips, demos, and even a few exclusive games that never came out in the United States in any other format. In an era of dial-up internet, Sega Channel delivered game data over television cable — a novel approach that gave the service its name.

...

https://gamehistory.org/segachannel/

20
2

The Linear Tape Open (LTO) consortium has reacted to higher capacity disk drives and SSDs with a 33 percent LTO-10 raw capacity upgrade to 40 TB and a target downgrade for the LTO-14 generation of 365 TB from the prior 576 TB.

LTO tape is used for archiving and the consortium of HPE, IBM, and Quantum is responsible for the LTO generational roadmap. The current generation is LTO-10 and it was originally specified to have a 36 TB raw capacity. In the event, sole LTO tape drive supplier IBM and tape-ribbon manufacturers Fujifilm and Sony were only able to achieve 30 TB. As disk drive capacity has passed 32 TB and SSD capacity is now at the 120 TB-plus level, tape cartridge capacity has been lagging behind.

21
1
22
1
23
6
24
9
25
4
view more: next ›

Data Hoarder

315 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 2 years ago
MODERATORS