I had three hard-drives that weren't in use and decided to make my backup a bit more resilient. Mind you that I'm doing backups in a 3-2-1 fashion, so the loss of the local disk would be just an inconvenience.
disk | write perf | read perf | benchmark |
---|---|---|---|
2 x TOSHIBA-HDWD240 | 190 MB/s | 187 MB/s | userbenchmark |
1 x WD WD40EZRZ-00GXCB0 | 169 MB/s | 172 MB/s | userbenchmark |
I wanted to put the drives as a RAID5 (data 4TB + data 4TB + parity 4TB), that way, if one of them failed, everything would keep working. I already had some experience with RAID; both on linux and on Intel hardware.
My issue with hardware RAID is that if the RAID controller fails, you're done. Software RAID evidently has a performance penalty.
Storage Spaces comes with Windows which is very welcome. I also like that you can connect the drives to another system and things will just work. Nothing to configure. What I wasn't expecting, though, was writes of 40 MB/s, sometimes lower!
R.A.I.D. - Redundant Array of Inexpensive Disks
Creating a "Storage Space" is as simple as searching for "Manage Storage Spaces", adding empty unformatted drives to a Storage Pool and from there creating a Storage Space.
If all goes well, you'll have a drive letter ready in no time. Couldn't be simpler. Unfortunately, it's not. If you're reading this page, you probably already experienced some crawling.
Columns, Interleave and Allocation
Where do columns and interleave sit within the Storage Space?
Column Count
Column Count is the number of physical disks that Storage Spaces stripes data across. On my scenario, it's pretty easy to understand that this number will be 3. I've got 3 disks, one column per disk.
Interleave Size
Interleave Size is the actual size of data that will be stored on each disk per stripe.
We got "auto" for the number of columns and 256 KB for the interleave.
Per the image above, we may see that when we want to write 512 KB of information we'll write 256KB on HDD1 (interleave), 256KB on HDD2 and another 256KB on HDD3 which is the parity disk (that information will have to be calculated on the fly).
Allocation Unit Size
Allocation Unit Size is the minimum size of each write into your NTFS partition. Regardless of data being written, it will be done in chunks matching Allocation Unit Size. Let us check what Windows did to the NTFS partition with the default settings:
Defaults for NTFS partitions are:
- 4KB for drives bellow 16TB
- 8KB for drives equal or above 16TB.
Slow writes?
Having interleave size as 256KB and Allocation Unit Size as 4KB is the bottleneck. Each write request has to allocate space on the partition in chunks of 4KB and then that has to be translated to the 256KB interlave chunks bellow!
How to fix this?
On this three disks scenario we know that disk 1 will get half data, disk 2 will get the other half and disk 3 will have data created on the fly. What if the interleave default size (256 KB) was in sync with the NTFS partition? If we were to format the disk with an Allocation Size Unit (AUS) of 512 KB:
NTFS (allocation 512 KB) user writes 512 KB |
||
DISK 1 interleave 256 KB half of write request |
DISK 2 interleave 256 KB half of write request |
DISK 3 interleave 256 KB generated data |
A setup like this would massively simplify the calculations needed for writes. We can easily format the drive with that AUS.
Why are disks formatted with Allocation Unit Sizes (AUS) of 4 KB instead of 512 KB? That has mainly to do with a trade-off between speed and storage efficiency.
Remember that AUS becomes the minimum possible file size on disk. Say you have 1000 files of 1 KB, you'd expect them to occupy 1000 KB (1 MB) on disk. However, if you have an AUS of 512 they will take over 512 MB of disk!
I choose files of 1000 KB because tiny files will be saved as "resident".
If the file is very small, to save disk space, NTFS (...) stores their contents right in the file record, so no cluster has to be allocated for it. Therefore, the size on disk is zero because there's nothing beyond the file record.
https://superuser.com/a/1030802/14529
Can we control interleave size?
We most definitely can! The only hiccup is that you'll have to create the Storage Space in PowerShell. Create the Storage Pool as you normally would have:
Open up PowerShell and check your pool name:
So my pool name is "Storage pool". We'll use that to create the "Storage Space" which in PowerShell goes by Virtual-Disk.
Open "Computer Management", go to "Disk Management", initialize the new Virtual Disk and format it with 128 KB as the Allocation Unit Size.
What should I set my values to?
Depends. The larger the AUS the more space you'll waste with small files.
If you'll only use the disk for media files like photographs and videos, NTFS allocation unit size will impact you much less than if you have a multitude of small files.
The one thing to remember is to keep the symmetry between columns, interleave space and allocation unit size.
I had some fun reading storagespaceswarstories.com, you may also want to give them a visit!
What drives to use?
As I've got a couple of decades of hobbyist data recover on me, I've been getting this question often. Samsung HDDs were brilliant. They would often die slowly, allowing even those without backups to almost certainly get their data back. They were my default choice for HDDs (still are for SSDs).
Unfortunately, Samsung HDD division was recently bought by Seagate, and I've had my share of Seagate disappointment. One place I would definitely recommend a stop is on BackBlaze's Hard Drive Reliability.
For 2023 these are in my price/quality list for a home Storage Spaces setup:
size | brand | reference | price | $/TB |
---|---|---|---|---|
12 TB | Western Digital | WD121KRYZ | ~ 300 USD | 25 $ / TB |
8 TB | Western Digital | WD8003FRYZ | ~ 240 USD | 30 $/TB |
4 TB | HGST | 0S03664 | ~ 180 USD | 45 $/TB |
2 TB | Hitachi | HUA723020ALA641 | ~ 70 USD | 35 $/TB |
There's always a bit of a silicon lottery, but all of these are solid models. They're not the fastest nor the most recent, but should last you a long time for a reasonable price.
How does this work in (n)^2 disks?
TynMahn made a very pertinent comment that may help other readers.
My question is, what do you do if you have 6 SIX disks in parity?
NTFS AUS = (NUM_COLUMNS - NUM_PARITY_DISKS) * INTERLEAVE
If you're running single parity, NUM_PARITY_DISKS will be 1, double parity, will be 2.
Let's imagine you have 5 disks, and want to run the cluster with single parity. You also want to keep NTFS AUS as 4kB as that is a good balance for the size of files you'll be storing.
Remember that NUM_COLUMNS will match the number of disks.
4kB = (5 - 1) * INTERLEAVE
INTERLEAVE = 4/4kB
INTERLEAVE = 1kB
Now imagine you'll be writing 12kB of data to the Storage Space.
As your NTFS AUS size is 4kB, you'll occupy three clusters.
Also notice that Storage Space uses a random drive for parity data.
And that's really all there is to it.
I've been asked what I use to connect the drives. Currently, two of these. Together they can connect up to 16 hard drives. Do notice that while the board has RAID support by default, you can re-flash it to work as a JBOD (just a bunch of disks).
If there's interest I'll write on how to re-flash the card BIOS.
The SATA drives are connected to the SAS PCI adaptor by 4 of these SFF-8643 cables by Cable Matters.
As an Amazon Associate I may earn from qualifying purchases on some links.
If you found this page helpful, please share.