In a previous blog, I wrote (see Five things you never knew about flash drives):
Flash drives only look like disks. In fact, nothing works the way you’d think. Flash is really different from magnetic recording, and those differences have a big impact on flash drive performance. How well vendors manage flash oddities has a huge impact on performance and even drive lifespan.
Honestly, I had no idea how right I was.
How long would it take to load an OS on a thumb drive?
I help friends in my small town (pop. ~10k) with computer problems. I thought it'd be handy to have a thumb drive with my favorite utilities loaded, so I started loading some on a generic 2 GB USB thumb drive. Most loaded as quickly as I expected, until I got to a 16 MB utility.
I dragged it to the thumb drive icon on the desktop, and the progress bar popped up with a 16 minute time estimate. 16 minutes! Less than a megabyte per minute on a flash drive that is capable of 15 MB a minute.
The progress bar was moving so slowly that I thought the machine had hung, but no, it was just going 1/15th the speed. Whoa!
I couldn't believe it
I thought something was wrong, so I tested it with a single MP3 file of the same size. No problem, loaded in less than a minute.
What the heck was happening?
First surprise: hundreds of sub-2 KB files
The utility is in the form of a file folder or package. The icon makes it look like a single piece of code, but it contains a couple of thousand files, many of them HTML help pages. These small files, and the write overhead they incur, must be the source of the slow loading.
When I first ran this test, it turned out the thumb drive was formatted with the old Microsoft FAT 16 file system, which in a 2 GB drive gives a cluster size of 32 KB. So not only was the load slow, the resulting file on the thumb drive was huge--about 4x bloat. After I published the first version of this post, a couple of alert readers pointed out the FAT 16 problem.
I reformatted the the thumb drive with FAT 32, with 4 KB clusters on a 2 GB disk, and the file bloat shrank to about 15 percent from 4x, but the load time stayed the same, or even longer.
What I think is happening
Being flash, every write has to be preceded by an erase cycle which is an overwrite of the entire block. I can't tell how big the block size is, but it is likely to be at least 64 pages and probably more. It appears that every write of a 2 KB file requires a 128 KB read to preserve existing data in the block, a 128 KB erase, and then the 2 KB file gets written along with the rest of the 128 KB of data already in the block.
Just eye-balling the numbers it looks like about 10 small writes per second--way worse than even a 1.8″ drive would do.
Update: The first two commenters quickly--and correctly--pointed out that I was seeing a FAT 16 problem, not a flash drive problem. I checked the file system on the thumb drive and sure enough, it was FAT 16. It must have come that way from the factory--something else to be aware of. I'm in the process of reformatting the drive with the FAT 32 file system. As soon as that completes, and its taken about five minutes so far, I'll retest and update again.
Update II: I've reformatted the flash drive to FAT 32 and am loading the same utility. Load time is just as long, maybe even longer. It appears to me that while FAT 16 may explain the file bloat due to the 32 KB cluster size, it is not explaining the slow load speed. In fact, it appears that loading the files under FAT 32 is taking longer than it did for FAT 16.
Next, I plan to reformat the flash drive with NTFS and see how that works. Stay tuned for more updates.
Update III: Windows XP wouldn't allow me to reformat the thumb drive with NTFS. It looks like FAT 32 is the best I can do. Next I'll try the OS X HFS+ and see what, if any, difference that makes.
I also changed the original text to focus on the load performance rather than the FAT 16 bloat. I'm not a Windows aficionado, so I suspect lots of non-technical users would get caught by this as I was. I wonder how many thumb drives come with FAT 16?
This is just one data point
I can't generalize to all thumb drives or to flash-based solid state disk (SSD) replacements from this one experiment. Why? Because each flash drive has its own way of converting from how flash works to how disks work: the translation layer. As I noted in the earlier post:
The most important piece of a flash drive is the translation layer. This software takes the underlying weirdness of flash and makes it look like a disk. The translation layer is unique to each vendor and none of them are public. Each makes assumptions that can throttle or help performance under certain workloads.
What workloads? Sorry, you’ll have to figure that out for yourself. The bottom line is that flash drive write performance will be all over the map as engineers try to optimize for a wide range of workloads.
This is clearly a case where lots of small files choke this particular translation layer.
The good news is that after the utility finally loaded using it was almost as fast as using it from a disk. Getting it ON the thumb drive was the problem, not getting it off.
The Storage Bits take
This experience made it clear to me that flash performance and capacity cannot be assumed from the vendor specs. Perhaps my thumb drive is poorly engineered, or optimized for capacity over everything else. In any case, this example made me realize just how different flash storage can be and how little we actually know about performance of specific implementations.
With flash the only thing you can be certain of is that your mileage *will* vary.