this post was submitted on 30 Nov 2023
1 points (100.0% liked)

Data Hoarder

168 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 1 year ago
MODERATORS
 

I have Googled and YouTubed and read all through this subreddit and cannot for the life of me figure out the right direction to go in for mass data management. I'm completely new to all the systems of NAS/DAS, RAID, etc., so please bear with me.

I am a freelance photographer and videographer with 30TB of files stored on regular Seagate portable HDDs (with duplicates). I edit on portable SSDs, and once the project is done, I transfer to the HDDs for long term archiving.

The assortment of HDDs is clunky and chaotic and they are slow for transferring data. I'm looking for a more sophisticated workflow for access, archive, and backup.

Things to note and my questions:

  • I am accumulating 5-10TB of new files to store annually.
  • I work alone, and don't have a strong need to access my files from everywhere. NAS feels like an overkill, but it seems to be what everyone online is talking about. Is there something I am missing here?
  • A DAS RAID setup (like Thunderbay?) seems like a better fit for me if I don't need network access. But from what I've learned, a RAID is not a backup. I would still need another physical copy of all files elsewhere. What do people with Thunderbays back up to? Another Thunderbay that they put somewhere else?
  • Something I hate about my current HDDs is it takes many hours to transfer large amounts of data. I would like faster read/write capabilities. Anyone have thoughts on using a RAID with SSDs vs HDDs?
  • Once RAID storage is full, can I completely remove and store all drives, and start a new RAID with fresh drives for new incoming files? I'm really looking for easy access to my last 1-2 years of files, so once something is old enough, I don't care about it sitting in a Thunderbay vs. a box, if that makes sense. If I wanted to access older RAID storage, can I remove all current drives and put the old ones back in? Is this chaotic??
  • Instead of a RAID setup, would it just be easier to get some giant capacity external HDDs (like a 20TB G Drive) and copy things twice? I don't really understand the difference between RAID 1 in a Thunderbay (or similar) and what I'm doing now: buying two hard drives and copying the same thing to both (except that the RAID mirrors automatically).
  • I am planning to use Backblaze for a cloud solution once I figure out my new physical solution.

Would love to keep the budget to $2000 for physical storage but I understand it gets expensive quickly and it's important for me to invest in the right system once than the wrong one multiple times. Open to all ideas.

Thanks for any insight!!

top 1 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 11 months ago

One way to solve your storage needs might be a three tier system. PC, DAS and cloud/external.

On the PC you have two SSDs, say 4-8TB. You use SSD1 (4TB?) as usual for the OS and for current projects. You use SSD2 (8TB?) for backups of project files and documents on SSD1. Perhaps automatically update an incremental backup or snapshot sync every boot.

In a multibay DAS, "DAS1", you install three big 20TB HDDs. "HDD1P1" , "HDD1S1" and "HDD1S2". P=Parity, S=Storage.

Using separate timestamped project folders, with project start/end dates, you first fill HDD1S1 with the oldest projects and then continue on HDD1S2 with the rest. You write-protect each closed project to avoid unnecessary mistakes, deletes or modifications.

Then, using Snapraid, you record parity data for HDD1S1 and HDD1S2 on HDD1P1. This means that if any of HDD1S1 or HDD1S2 fail, or files are corrupted or deleted by mistake, you can use Snapraid and HDD1P1 and the still working HDD to recreate the contents of the failed HDD or the corrupt/deleted files.

Every time you finish a project you archive it on the DAS, write protect the project folder, and update the stored parity on HDD1P1 using Snapraid.

When HDD1S2 starts to fill up, you add HDD1S3 to the Snapraid and simply carry on. Assuming a 5 bay DAS, when HDD1S4 has filled up you add DAS2 and add HDD2P1 and HDD2S1 and carry on.

When you have filled DAS1 it is possible that you choose some other methods or at least start using 40TB SSDs in DAS2. You may need it because you by then may be using high resolution 360 autonomous multicam AI augmented 3D photogrammetry or perhaps even videogrammetry.

In addition to using SSD2 for backups and Snapraid for parity, you need at least one more level. Perhaps sync current project folders on SSD1 with cloud storage as well as with SSD2. Also store achieved projects not only on the DAS, with Snapraid, but also on loose external cold storage drives.

You might also consider using more than one parity drive per DAS. Especially if you use a DAS with many drives. Only parity drive for three storage drives is commonly used. For more storage, or if you are paranoid, you might want more parity. HDD1P2. Or even two archive DAS.

Conveniently, Snapraid can create a read-only pool of all the drives in the DAS. Making it much easier to navigate, browse and search, than looking in separate drives.

I have very good experience with this 5 bay DAS: ICY BOX IB-3805-C31. I think it is also sold under the brand "Sabrent".

I use SSD1/SSD2 automatic backups/snapshots technique both on my PC and on my Laptop. I have two DAS, one for storage and one for archiving. Currently I use a mix of 12TB-18TB Exos and Ironwolf drives. Slowly replacing 12TB drives that are out of warranty and use them for extra cold archive. I use Snapraid with dual parity drives and also mergerfs. I also use an old remote NAS for some additional backup.