I had three hard-drives that weren't in use and decided to make my backup a bit more resilient. Mind you that I'm doing backups in a 3-2-1 fashion, so the loss of the local disk would be just an inconvenience.
|2 x TOSHIBA-HDWD240
|1 x WD WD40EZRZ-00GXCB0
I wanted to put the drives as a RAID5 (data 4TB + data 4TB + parity 4TB), that way, if one of them failed, everything would keep working. I already had some experience with RAID; both on linux and on Intel hardware.
My issue with hardware RAID is that if the RAID controller fails, you're done. Software RAID evidently has a performance penalty.
Storage Spaces comes with Windows which is very welcome. I also like that you can connect the drives to another system and things will just work. Nothing to configure. What I wasn't expecting, though, was writes of 40 MB/s, sometimes lower!
R.A.I.D. - Redundant Array of Inexpensive Disks
Creating a "Storage Space" is as simple as searching for "Manage Storage Spaces", adding empty unformatted drives to a Storage Pool and from there creating a Storage Space.
If all goes well, you'll have a drive letter ready in no time. Couldn't be simpler. Unfortunately, it's not. If you're reading this page, you probably already experienced some crawling.
Columns, Interleave and Allocation
Where do columns and interleave sit within the Storage Space?
Column Count is the number of physical disks that Storage Spaces stripes data across. On my scenario, it's pretty easy to understand that this number will be 3. I've got 3 disks, one column per disk.
Interleave Size is the actual size of data that will be stored on each disk per stripe.
We got "auto" for the number of columns and 256 KB for the interleave.
Per the image above, we may see that when we want to write 512 KB of information we'll write 256KB on HDD1 (interleave), 256KB on HDD2 and another 256KB on HDD3 which is the parity disk (that information will have to be calculated on the fly).
Allocation Unit Size
Allocation Unit Size is the minimum size of each write into your NTFS partition. Regardless of data being written, it will be done in chunks matching Allocation Unit Size. Let us check what Windows did to the NTFS partition with the default settings:
Defaults for NTFS partitions are:
- 4KB for drives bellow 16TB
- 8KB for drives equal or above 16TB.
Having interleave size as 256KB and Allocation Unit Size as 4KB is the bottleneck. Each write request has to allocate space on the partition in chunks of 4KB and then that has to be translated to the 256KB interlave chunks bellow!
How to fix this?
On this three disks scenario we know that disk 1 will get half data, disk 2 will get the other half and disk 3 will have data created on the fly. What if the interleave default size (256 KB) was in sync with the NTFS partition? If we were to format the disk with an Allocation Size Unit (AUS) of 512 KB:
NTFS (allocation 512 KB)
user writes 512 KB
interleave 256 KB
half of write request
interleave 256 KB
half of write request
interleave 256 KB
A setup like this would massively simplify the calculations needed for writes. We can easily format the drive with that AUS.
Why are disks formatted with Allocation Unit Sizes (AUS) of 4 KB instead of 512 KB? That has mainly to do with a trade-off between speed and storage efficiency.
Remember that AUS becomes the minimum possible file size on disk. Say you have 1000 files of 1 KB, you'd expect them to occupy 1000 KB (1 MB) on disk. However, if you have an AUS of 512 they will take over 512 MB of disk!
I choose files of 1000 KB because tiny files will be saved as "resident".
If the file is very small, to save disk space, NTFS (...) stores their contents right in the file record, so no cluster has to be allocated for it. Therefore, the size on disk is zero because there's nothing beyond the file record.
Can we control interleave size?
We most definitely can! The only hiccup is that you'll have to create the Storage Space in PowerShell. Create the Storage Pool as you normally would have:
Open up PowerShell and check your pool name:
So my pool name is "Storage pool". We'll use that to create the "Storage Space" which in PowerShell goes by Virtual-Disk.
Open "Computer Management", go to "Disk Management", initialize the new Virtual Disk and format it with 128 KB as the Allocation Unit Size.
What should I set my values to?
Depends. The larger the AUS the more space you'll waste with small files.
If you'll only use the disk for media files like photographs and videos, NTFS allocation unit size will impact you much less than if you have a multitude of small files.
The one thing to remember is to keep the symmetry between columns, interleave space and allocation unit size.
I had some fun reading storagespaceswarstories.com, you may also want to give them a visit!
What drives to use?
As I've got a couple of decades of hobbyist data recover on me, I've been getting this question often. Samsung HDDs were brilliant. They would often die slowly, allowing even those without backups to almost certainly get their data back. They were my default choice for HDDs (still are for SSDs).
Unfortunately, Samsung HDD division was recently bought by Seagate, and I've had my share of Seagate disappointment. One place I would definitely recommend a stop is on BackBlaze's Hard Drive Reliability.
For 2023 these are in my price/quality list for a home Storage Spaces setup:
|~ 300 USD
|25 $ / TB
|~ 240 USD
|~ 180 USD
|~ 70 USD
There's always a bit of a silicon lottery, but all of these are solid models. They're not the fastest nor the most recent, but should last you a long time for a reasonable price.
How does this work in (n)^2 disks?
TynMahn made a very pertinent comment that may help other readers.
My question is, what do you do if you have 6 SIX disks in parity?
The answer is straightforward, for the above to work, the rule is to have interleave exactly half the size of NTFS allocation.
This will work for simple and dual parity, remember that storage spaces, like RAID, will not split a chunk of data into every single disk.
If you're using single parity, will it split data chunks into two disks and use a 3rd one as a parity disk. Dual parity is exactly the same, but parity data will be split into two disks.
How many disks this will occupy from within your pool depends on how much data you'll be writing down.
Imagine NTFS allocation is 2 bits and interleave is 1 bit. Let's pretend you want to store bits
11 to a cluster with 3 disks. Notice that user data will always be
1 and parity will always be
1^1 = 0.
The above would write to disk
| 1 | 1 | 0 |.
Now say you want to write
11111111 to 6 disks with double parity
And that's really all there is to it. Just keep your interleave as 1/2 of your NTFS AUS.