Warp speed file serving with pNFS

Files: quickly getting bigger. Networks: slowly getting faster.

Files: quickly getting bigger. Networks: slowly getting faster. Something's got to give. Here's the scoop.

Parallel NFS: standards-based parallel file serving The Network File System (NFS) is the oldest NAS (Network Attached Storage) protocol. Developed by Sun in the '80s and made an open standard, NFS makes files on the network available anywhere.

Small files: great. Big files: lo-o-o-ng time coming NAS is popular because it uses cheap, reliable and reasonably fast Ethernet instead of cranky, expensive and very fast Fibre Channel. NFS is very popular as the storage protocol for compute clusters. Yet as data sets and file sizes have grown, the relative speed of Ethernet just hasn't kept up.

I worked with some oil companies doing reservoir modeling about six years ago. Even then it was taking them 6-10 hours just to move data from one stage of their workflow to the next. It was killing them.

With 10 gigabit Ethernet coming up, our problems should be solved. But no, NFS had a tough time scaling to gigabit Ethernet. That's why you see TCP Offload Engines (TOEs), custom hardware pipelines and other costly go-fast goodies on gigE storage.

Enter the dragon The Internet Engineering Task Force is the NFS standards body. They started working on developing a parallel version of NFS to enable much higher speeds about four years ago. The new standard, NFS v4.1, should reach final draft status later this year. Some early birds may be out with products late this year as well.

How NFS works Standard NFS file servers work like your PC does: the files are on local disks, and the computer keeps track of their location, name, creation and modification dates, size and so on. The location and so forth is called metadata which means data about your data.

When you request a file, the file server receives the request, looks up the metadata, converts it to disk I/O requests, collects the data and then ships it over the network to you. With small files most of the time is spent collecting the data.

With big files the data transmission time becomes the limiting factor. What if you could break a big file into pieces and ship it in parallel to a compute server? That would be faster, especially with several parallel connections.

That's exactly what parallel NFS (pNFS) does.

How pNFS works pNFS splits the NFS file server into two types of servers: the metadata and control server; and as many storage servers as you can afford. Together the control server and the storage servers form a single logical NFS server with a slew of network connections. The compute server, which is likely to be a Beowulf cluster, also has plenty of Ethernet ports as well.

So the compute server requests a file using the new v4.1 NFS client. The NFS control server receives the request and looks up where the file chunks reside on the various storage servers. It send this information, called a layout, back to the NFS v4.1 client, which then tells its cluster members where to get the data. The cluster members then, using the layout, request the data directly from the storage servers.

If you've got 10 storage servers for a 10 node cluster, you will see something close to a 10x increase in speed. 100 of each and you'll see close to 100x increase. It is almost magic.

AND it's backward compatible You'll still be able to access the data even with a lowly PC. Your NFS client makes the request, the control server gathers the data itself, and sends it on to you. Except for the fact that it is slower than pNFS, you'll never know the difference.

No changes to applications either. The IETF team did a good job on this one.

The Storage Bits take pNFS is going to be very popular in the large-scale high performance computing cluster space. These clusters are so big that adding just a few hundred bucks per node for some tweak quickly adds up.

I fantasize about a home pNFS array for video editing: stick four gigE ports on my local machine and editing large files wouldn't be nearly as painful. But that is a ways off. For the big clusters though, a new day is starting to dawn.

Comments welcome, of course. Like reading specs? The IETF NFS v4.1 specs page will make your day.