this post was submitted on 21 Oct 2023
300 points (100.0% liked)

196

16490 readers
2732 users here now

Be sure to follow the rule before you head out.

Rule: You must post before you leave.

^other^ ^rules^

founded 1 year ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 3 points 1 year ago (2 children)

OH MY GOODNESS WE HAVE THE SAME INTERESTS. I tried the exact same InfiniBand thing a while back only to realize the cards I bought were either duds or needed some weird server something to make them work -- neither would show up on an lspci, and I wasn't sure how to even begin to diagnose that. I also read online that the throughput of IBoIP was like 7Gbps -- sounds like that's not true?

Also, holy cats! 108 TERABYTES of spinning rust in RAID10? How many hard drives is that? Do you actually have a Storinator in your living room? What do you DO with it?

Also, do you have any cool tips for working with zfs? I've been screwing around a bit with TrueNAS lately and it's been a real pain in the rear. Apparently ZFS remembers the last machine it was mounted from and gets mad if you try to mount it from a different one, plus there's the problem of it being impossible to change the number of drives in an array without creating an entirely new array (have they fixed that yet?). I've been wanting to use btrfs instead but 1) slow and 2) the internet is filled with horror stories about btrfs raid

[–] [email protected] 3 points 1 year ago* (last edited 1 year ago)

neither would show up on an lspci, and I wasn’t sure how to even begin to diagnose that. I also read online that the throughput of IBoIP was like 7Gbps – sounds like that’s not true?

I believe IBoIP on my specific Mellanox Connect X-3 is limited to 10gbps while the Infiniband connections can go to 40gbps. It probably depends on the network card itself. Any of my high-bandwidth usecases are going to come from NFS, and anything else doesn't need more than 10gbps, e.g. media streaming. I run Proxmox and Debian stuff, and I was able to get everything working by installing rdma-core and opensm packages. I have the following in my root crontab to switch the IB ports over to "connected" mode, which allows a higher MTU and is faster:

@reboot echo connected > /sys/class/net/ibp10s0/mode
@reboot echo connected > /sys/class/net/ibp10s0d1/mode
@reboot /usr/sbin/ifconfig ibp10s0 mtu 65520
@reboot /usr/sbin/ifconfig ibp10s0d1 mtu 65520

I also use @reboot echo rdma 20049 > /proc/fs/nfsd/portlist to enable NFS to operate in RDMA mode for Infiniband communication. It was really tough to figure out how to do a lot of Infiniband stuff until I found this manual, after which everything just worked like it should. Overall, I would prefer equivalent Ethernet hardware if I was given a choice but Infiniband stuff is dirt cheap and it's hard to argue with $20 for 40gbps.

Also, holy cats! 108 TERABYTES of spinning rust in RAID10? How many hard drives is that? Do you actually have a Storinator in your living room? What do you DO with it?

6x18TB, I store them trivially in a Node 804. There's plenty of room for growth in that case still, and I just run cheap consumer hardware within it. I store "Linux ISOs" on it as well as any and all data from my life. I'm pretty loaded IRL so I figure if it's worth archiving it's worth archiving right, and I don't mind keeping the highest quality versions of anything I like.

Also, do you have any cool tips for working with zfs?

Yeah ZFS is quite easy to work with, once you get a compatible kernel. TrueNAS is a dead simple way to interface with ZFS, though I wouldn't recommend it as the only thing on your NAS because it's very inflexible with running non-TrueNAS-approved usecases and Docker etc. Personally I would recommend using Proxmox with a minimal TrueNAS VM underneath it, or just skipping TrueNAS entirely and letting Proxmox manage your ZFS pool. You can install Cockpit + this plugin for a simple GUI ZFS manager that does most of the important stuff, without needing to run a full TrueNAS VM. If you're still new to ZFS I would stick with TrueNAS though, since it will hold your hand while learning. Once you understand ZFS better you can drop it if it's getting in your way.

Apparently ZFS remembers the last machine it was mounted from and gets mad if you try to mount it from a different one

This shouldn't be the case - you may need to configure your disk identifiers to use IDs (portable between machines) instead of e.g. generic block labels (not even necessarily the same between reboots). A ZFS pool should be able to move between machines with no fuss.

the problem of it being impossible to change the number of drives in an array without creating an entirely new array (have they fixed that yet?)

Yes this is a very annoying problem and it's the main reason I use ZFS's equivalent of RAID10: mirror vdevs. Mirrors in ZFS are much more flexible than RAIDZ, and that flexibility extends to very random things. For example, a SLOG device can be freely detached from a zpool consisting of mirrors, but not one consisting of RAIDZ. Mirror drives can be added in pairs at any time, which means I can add a couple drives of any size whenever I feel like it - this makes sense for larger disk sizes in the future and random drives that could be on sale. RAIDZ's mythical future is RAIDZ expansion, which would allow you to grow your RAIDZ array from e.g. 4 disks to 5 disks without recreating it or destroying it. This future is a reality in that the code has already been merged, it's just waiting to get baked into the OpenZFS release.

I’ve been wanting to use btrfs instead but 1) slow and 2) the internet is filled with horror stories about btrfs raid

BTRFS RAID is a non-starter in that it's marked as "unstable" by the developers and will cause data loss. However, you can use MergerFS+SnapRAID for the RAID logic, and back that setup with individual BTRFS drives that do not know they are in a RAID. MergerFS is an overlay filesystem that basically combines any number of drives into appearing like a single drive, no matter what filesystem the drives are using. When you write data to a MergerFS array, the files will transparently go to random disks instead of striping. The strategy that it uses to distribute files can be changed to other methods as well. SnapRAID is a data redundancy system that allows you to calculate the parity of any number of drives onto 1-6 parity drives, which can restore that data in the event of failure. MergerFS and SnapRAID are almost always used together in order to give a traditional RAID experience

This solution is not quite as fancy as ZFS but it would still be my recommendation for ad-hoc budget setups where you have mismatched drives and random hardware, because SnapRAID does not care about drive size uniformity (unlike ZFS). You need to dedicate your largest 1-2 disk(s) to being the parity drive(s), and then you just throw as many data drives as you can at it. The drives will not work in tandem so speeds will just be at the speed of whatever disk has the file. BTRFS is a great filesystem if you don't use its RAID features, and its speed is probably equivalent to ZFS or maybe even faster. However, in normal usage ZFS cheats a lot by using its smart ARC cache and other tricks in order to make common disk activity much faster than the disks themselves.

[–] [email protected] 1 points 1 year ago* (last edited 1 year ago) (1 children)

FYI: RAIDZ expansion just got merged: https://github.com/openzfs/zfs/pull/15022

Estimated timeline is about a year from now for OpenZFS 2.3 which will include it.

[–] [email protected] 2 points 1 year ago