Flash drives: your mileage WILL vary
Summary: Flash is an alien technology for disk users. I've noted before that flash drives can have really terrible write performance, but until I ran into it myself I had no idea how bad flash write performance could be.
Flash is an alien technology for disk users. I've noted before that flash drives can have really terrible write performance, but until I ran into it myself I had no idea how bad flash write performance could be.
Last month I wrote (see Five things you never knew about flash drives)
Flash drives only look like disks. In fact, nothing works the way you’d think. Flash is really different from magnetic recording, and those differences have a big impact on flash drive performance. How well vendors manage flash oddities has a huge impact on performance and even drive lifespan.
Honestly, I had no idea how right I was.
How long would it take to load an OS on a thumb drive? I help friends in my small town (pop. ~10k) with computer problems. I thought it'd be handy to have a thumb drive with my favorite utilities loaded, so I started loading some on a generic 2 GB USB thumb drive. Most loaded as quickly as I expected, until I got to a 16 MB utility.
I dragged it to the thumb drive icon on the desktop, and the progress bar popped up with a 16 minute time estimate. 16 minutes! Less than a megabyte per minute on a flash drive that is capable of 15 MB a minute.
The progress bar was moving so slowly that I thought the machine had hung, but no, it was just going 1/15th the speed. Whoa!
I couldn't believe it I thought something was wrong, so I tested it with a single MP3 file of the same size. No problem, loaded in less than a minute.
What the heck was happening?
First surprise: hundreds of sub-2 KB files The utility is in the form of a file folder or package. The icon makes it look like a single piece of code, but it contains a couple of thousand files, many of them HTML help pages. These small files, and the write overhead they incur, must be the source of the slow loading.
When I first ran this test, it turned out the thumb drive was formatted with the old Microsoft FAT 16 file system, which in a 2 GB drive gives a cluster size of 32 KB. So not only was the load slow, the resulting file on the thumb drive was huge - about 4x bloat. After I published the first version of this post, a couple of alert readers pointed out the FAT 16 problem.
I reformatted the the thumb drive with FAT 32, with 4 KB clusters on a 2 GB disk, and the file bloat shrank to about 15% from 4x, but the load time stayed the same, or even longer.
What I think is happening Being flash, every write has to be preceded by an erase cycle which is an overwrite of the entire block. I can't tell how big the block size is, but it is likely to be at least 64 pages and probably more. It appears that every write of a 2 KB file requires a 128 KB read to preserve existing data in the block, a 128 KB erase, and then the 2 KB file gets written along with the rest of the 128 KB of data already in the block.
Just eye-balling the numbers it looks like about 10 small writes per second - way worse than even a 1.8" drive would do.
Update: The first two commenters quickly - and correctly - pointed out that I was seeing a FAT 16 problem, not a flash drive problem. I checked the file system on the thumb drive and sure enough, it was FAT 16. It must have come that way from the factory - something else to be aware of. I'm in the process of reformatting the drive with the FAT 32 file system. As soon as that completes, and its taken about five minutes so far, I'll retest and update again.
Update II: I've reformatted the flash drive to FAT 32 and am loading the same utility. Load time is just as long, maybe even longer. It appears to me that while FAT 16 may explain the file bloat due to the 32 KB cluster size, it is not explaining the slow load speed. In fact, it appears that loading the files under FAT 32 is taking longer than it did for FAT 16.
Next, I plan to reformat the flash drive with NTFS and see how that works. Stay tuned for more updates.
Update III: Windows XP wouldn't allow me to reformat the thumb drive with NTFS. It looks like FAT 32 is the best I can do. Next I'll try the OS X HFS+ and see what, if any, difference that makes.
I also changed the original text to focus on the load performance rather than the FAT 16 bloat. I'm not a Windows aficionado, so I suspect lots of non-technical users would get caught by this as I was. I wonder how many thumb drives come with FAT 16?
This is just one data point I can't generalize to all thumb drives or to flash-based solid state disk (SSD) replacements from this one experiment. Why? Because each flash drive has its own way of converting from how flash works to how disks work: the translation layer. As I noted in the earlier post:
The most important piece of a flash drive is the translation layer. This software takes the underlying weirdness of flash and makes it look like a disk. The translation layer is unique to each vendor and none of them are public. Each makes assumptions that can throttle or help performance under certain workloads.
What workloads? Sorry, you’ll have to figure that out for yourself. The bottom line is that flash drive write performance will be all over the map as engineers try to optimize for a wide range of workloads.
This is clearly a case where lots of small files choke this particular translation layer.
The good news is that after the utility finally loaded using it was almost as fast as using it from a disk. Getting it ON the thumb drive was the problem, not getting it off.
The Storage Bits take This experience made it clear to me that flash performance and capacity cannot be assumed from the vendor specs. Perhaps my thumb drive is poorly engineered, or optimized for capacity over everything else. In any case, this example made me realize just how different flash storage can be and how little we actually know about performance of specific implementations.
With flash the only thing you can be certain of is that your mileage *will* vary.
Comments welcome, of course. Anyone with similar experiences? Can anyone better explain the behavior I saw?
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
This isn't a Flash problem, it's a FAT problem
this isn't a problem with flash, if you had a 100 gig hard drive and you formatted it with FAT instead of FAT32 or NTFS you'd find yourself with the exact same problem.
solution would be to format the stick with either fat32 or ntfs, but i would be careful as some os's can't handle ntfs (win98) so if you're going to fix a win98 machine it wouldn't read the stick if it were formatted in ntfs.
Valis
CEO
Valis Enterprises
http://www.valissoft.com
sorry, this is a flash problem
What he is experiencing is a well-known issue with flash memory.
I see the exact same thing on flash drives formatted with ntfs.
You can format a flash drive as fat, then use convert at the command prompt to convert it to ntfs.
No, sorry, it's a Windows problem...
some more info:
"Flash and SSD devices are good at reading data, but are not as good at writing data. The reason for the poor write performance is that these (NAND based) devices must erase the space used for new file writes, immediately prior to writing the new data. This is known as erase-on-write or erase/write. Improvements in this area are coming (phase-change memory)."
RE: Flash drives: your mileage WILL vary
Might Sound Obvious, But . . .
You are both right
Thanks!
Robin
Unfortunately . . .
So, on the bright side, with that utility comprised of 2KB files, you're now wasting only 2KB per file instead of 30KB. ;)
As far as the base argument of performance varying from flash drive to flash drive, you're absolutely correct. An interesting case in point can be found here, in an article testing a variety of USB flash drives for Vista ReadyBoost:
http://www.extremetech.com/article2/0,1697,2017818,00.asp
Of the nine USB flash drives tested, six passed the basic criteria to be able to run ReadyBoost. Not that I consider testing a peripheral against Vista to be the be-all, end-all of performance testing, but it does highlight the fact that there really doesn't seem to be much of a standard as to just how slow or fast USB flash memory is from one brand to the next, or even within brands.
Anyway . . . I am curious to hear more real-world performance testing here, such as what you're doing with actually loading up a flash drive with a USB-boot toolkit (which does well to test things like format times, copy times, etc.), if you manage to pick up a few different brands of memory keys. I've put the same sort of thing together, but never really timed it out, other than noticing that it is at least faster than a CD-boot toolkit, most of the time. ;)
Different companies
Gizmo Richard (http://www.techsupportalert.com/) had a review of several dongles, and recommend a few for use with portable apps. Might want to check it out as well.
- Kc
antivirus or chipset?
I have also noticed with windows in general that the larger the number of small files, the longer it takes to transfer them.
And thirdly, your chipset/chipset drivers may be a factor as well. I have a dell at work that reads my u3 capable sandisk thumbdrive and mp3 player okay, but the thumbdrive isn't readable by one of my older systems at home. My lexar 1GB is read just fine by this older system, but not by my dell at work. Go figure.
BTW...CA sucks (or maybe it's just how the IT folks at my company configured it). When it is running, it takes half my 3 GHz processor and, as I mentioned before corrupts my data sometimes when I put it on a usb drive.
You have to enable write caching....
But...if you enable "write caching" on the drive it will complete in seconds. The only thing is you have to remember to "safely remove" the drive via the little green icon in your system tray. This ensures that all disk transactions are complete.
Windows usually does its utmost to disable write caching on all removable drives so this "safe removal" thing is a bit of a joke (does anybody you know actually use it?)
PS: If you disable write caching on your hard disk you'll think your machine is broken.
Write caching: More....
The only way I can find to turn on write-caching is via "Administrative tools->Computer management->Disk manager". This pops up exactly the same "policies" dialog but this one seems to work.
eg. To delete 3000 files from the disk takes [b]45 minutes[/b] without caching but only a couple of seconds with it enabled.
Final benchmarks
a) Copy a 2900 files (600Mb) from my hard disk to a pen drive
b) Delete the folder from the drive
Without write caching it took about an hour to copy the files and 45 minutes to delete them.
With write caching enabled it took 16 minutes for the copy and 25 seconds for the delete.
Sounds about right
But yeah, enabling it can really give you a performance boost. As long as you remember to safely remove it.
But - I know a lot of people who just yank out their flash drive without a second thought - and for them it's good to have caching turned off.
I also know some people who will take care to "safely remove" their drive even with caching turned off, lol.
"PS: If you disable write caching on your hard disk you'll think your machine is broken."
You certainly will - harddrives are very slow devices! Don't tell me you actually tried it . . .
Write caching
About corruption
Sorry dude.
Maybe you're thinking of a transactional file system. Even that isn't perfect.
You can't write data to the hard drive, or any drive for that matter, without taking the time to write the data to the drive. TANSTAAFL.
You misunderstand journalling.
What happens is a journal of changes is kept (hence "journaling"), and they are applied to a copy of the file before the old file is overwritten.
Here's how it ensures atomicy:
-If the power is turned off while writing the journal, the journal is wiped and the old file is left intact. It's as if the write never took place. You're left with an old, but uncorrupted, copy of the file.
-Once the journal is completely written, a flag is set to tell the computer that it's 100% written. Now you have an intact old file and an intact journal. The computer then starts creating a backup copy of the file.
-If the power is turned off while creating a copy of the old file, then the copying process is restarted. The old file and journal are intact, so the process can be restarted to create the new file.
-Once a copy is made of the old file, a flag is set to tell the computer that the old file is done copying. Now you have two copies of the old file, so you can start writing to one while keeping the other intact in case something happens.
-If the power is turned off while writing the journaled changes to the file, then the copy is still intact. The unchanged file can then be copied and the process of writing the new file can be restarted.
In all cases, only one of the files is ever half-written, so you can always use the other one if the power is disrupted.
"You can't write data to the hard drive, or any drive for that matter, without taking the time to write the data to the drive."
Correct, but you [b]can[/b] make a copy of it before changing it!! That way you have a backup to revert to if something happens! Which is essentially what journaling is in a nutshell: It's creating a backup of the old data before writing the new data.
"It won't recover your unwritten"
Unwritten data can never be recovered, since the computer's state is completely wiped in such an event. In this case, you simply revert back to copy of the file you made before you started writing to it. An old but intact copy is better than a new but corrupted copy.
"or overwritten data."
If you have a backup copy of the file, this is false. You only write to one copy, leaving the other copy intact until the process is finished.
Good explanation. Thanks!
That's not journaling
Journaling only rolls back changes to the MFT, making it more robust and making running a disk check after every crash unnecessary. It will not, however, prevent data from being directly overwritten, and if a crash occurs mid-write during such an overwrite you will lose data.
IIRC, Vista does introduce transactional file operations, but it's an optional thing that applications have to be programmed to use.