Your capacity will vary

By an increasing amountWhy is your storage capacity always less, sometimes a lot less, then what you see advertised on the box? There is only one rule: you will never get the capacity the vendor advertises.
Written by Robin Harris, Contributor

By an increasing amount Why is your storage capacity always less, sometimes a lot less, then what you see advertised on the box? There is only one rule: you will never get the capacity the vendor advertises.

Storage vendors don't mean to be lying. They just have a world view that you and your OS don't happen to share. In their minds their numbers are justifiable.

The disk problem The major cause of disk drive capacity shrinkage is the difference between how disk drives measure capacity and how your computer measures capacity.

Memory, like the RAM, is measured in powers of two. A gigabyte of RAM is really 1,073,741,824 bytes of capacity.

Disk capacity is measured in powers of 10. Thus any gigabyte of disk capacity is one billion bytes.

Your computer measures hard disk capacity in a power of two. Thus 1 million bytes of disk becomes 977 kilobytes and you just lost 2.3% of your apparent capacity.

As disk drives get bigger the problem gets worse. Here's a table comparing binary powers to decimal powers:

Officially, disk vendors have the standards bodies on their side: a MB is officially defined as 1,000,000 bytes. What the memory vendors should use are the binary prefixes kibi, mibi, gibi and the like. The bi stands for binary. Who knows, someday it might catch on.

But even most computer publications stick to the old, unofficial definitions that we all use. The disk drive vendors should switch from decimal to binary prefixes because that is how operating systems measure drive capacity.

And as the table above shows the problem is only getting worse as disk capacities grow.

The array problem Disk arrays have a different problem: raw capacity vs protected capacity. Raw capacity is simply the sum - in decimal - of the capacity of the disk drives in the array. A 4 drive array with 1 TB drives has a 4 TB raw capacity.

But unless you use RAID 0 striping, which doesn't protect your data - lose 1 drive and all your data goes away - your usable capacity will be less. Far less.

With a 4 drive RAID array - like one I recently tried to test - RAID 5 will give you 3 drives worth of capacity, saving 1 drive for parity data. BTW, I wouldn't use such a configuration with 1 TB SATA drives: you have a 25% chance of losing data during a rebuild.

Much more reliable is a mirrored configuration. With a 4 drive array mirroring would give you 2 TB of protected capacity - only 50% or your raw capacity. But your data is much safer mirrored.

Array capacity arguments are common among enterprise array vendors - and if you were paying $5/GB raw you might be more interested in the usable capacity too. With 100's of TB in a single array, even small percentage differences start looking big.

The Storage Bits take There's only 1 good strategy for dealing with storage capacity: have more storage than you need. Most enterprises run with 2-3x the capacity they need - mostly for performance reasons - but the extra comes in handy for end-of-quarter capacity spikes or slower than expected capital approval cycles.

Home users should keep 10-20% of their disk unfilled. Windows and Mac OS X are virtual memory operating systems, which means they use disk space to substitute for DRAM when main memory fills up. Without enough spare capacity the virtual memory system can't do its job efficiently and your system slows down.

The good news: disk capacity is cheap and rapidly getting cheaper. 25 years ago disk cost $25,000 per gigabyte. Today it is less than $0.25 per gig. Fill 'er up!

Comments welcome, of course.

Editorial standards