TidyFS: Microsoft's simpler distributed file system

TidyFS: Microsoft's simpler distributed file system

Summary: Just about a year ago, I first mentioned TidyFS, a new, small distributed file system under development by Microsoft Research. Later this week at the Usenix '11 conference, Microsoft researchers behind the TidyFS will be sharing more publicly about their work.

TOPICS: Microsoft

Just about a year ago, I first mentioned TidyFS, a new, small distributed file system under development by Microsoft Research. Later this week at the Usenix '11 conference, Microsoft researchers behind the TidyFS will be sharing more publicly  about their work.

TidyFS is a distributed file system for parallel computations on clusters. On commodity, "shared-nothing" clusters, the primary workloads tend to be generted by distributed execution engines like MapReduce, Hadoop or Microsoft's Dryad, the Microsoft researchers note in the abstract of their presentation. Other vendors have created distributed file systems for these workloads -- like the Google File System (GFS) and the Hadoop Distributed File System (HDFS). Microsoft has one in development, too: TidyFS.

Here's an architectural diagram from Microsoft from a year ago showing how researchers were envisioning that TidyFS and other experimental components would fit together:

(click on image above to enlarge)

Microsoft researchers are emphasizing the simplicity and small size of TidyFS as differentiators from the other parallel file systems out there. And they're sharing some of their experiences using the file system in a limited way inside Microsoft Research in their white paper detailing their TidyFS work.

From the TidyFS white paper:

"The TidyFS storage system is composed of three components: a metadata server; a node service that performs housekeeping tasks running on each cluster computer that stores data; and the TidyFS Explorer, a graphical user interface which allows users to view the state of the system."

Microsoft Research has been deploying and using actively TidyFS for the past year on a research cluster with 256 servers running large-scale, data-intensive computations, according to the white paper. The research cluster is used only for programs run using DryadLINQ, which is a parallelizing compiler for .Net programs using Dryad. (I've written before about Dryad -- a first commercial version of which Microsoft is planning to deliver later this year as part of a service pack for Windows Server 2008 R2 HPC.)

"On a typical day, several terabytes of data are read and written to TidyFS through the execution of DryadLINQ program," the white paper notes.

The experimental TidyFS cluster also is making use of a cluster-wide scheduler, codenamed "Quincy," and a computational cache-manager, codenamed "Nectar." Even though TidyFS was designed in conjunction with these various other distributed-clustering research projects, the Dryad and DryadLINQ pieces seem to be further along the path to commercialization. (When I asked Microsoft officials earlier this year if Quincy and Nectar would be commercialized later this year along with Dryad, I was told they were not on the same delivery trajectory.)

Nonetheless, the white paper says that "rather than making TidyFS more general, one direction we are considering is integrating it more tightly with our other cluster services."

As with all Microsoft research projects, there is no absolute guarantee as to when and if TidyFS will evolve into a commercial product or part of a commercial product. However, given that Dryad is on its way to being released as "LINQ to HPC" later this year, I'm thinking TidyFS may not be that far behind, and may someday find its place in the Microsoft "cloud as supercomputer" strategy, alongside Dryad.

Topic: Microsoft


Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • RE: TidyFS: Microsoft's simpler distributed file system

    Sound great! Now lets hope we also get to see some of what has been promised with Cairo and WinFS for these many years.
    • RE: TidyFS: Microsoft's simpler distributed file system

      @windowseat - if it is based on LINQ, then we should start seeing the queryability of WinFS. I wonder if this will show up in Windows 8 in some form. Maybe being used to combine cloud storage and local storage into one distributed cluster.
  • Sounds good .... but will they deliver?

    For years, MS promised and hyped the virtues of WinFS ..... and it NEVER delivered.

    If it weren't for Duke Nukem Forever .... WinFS would probably own the record of perpetual vaporware.

    (Yes, DNF it was finally released, canceling the vaporware status ... but it still has a record 14+ years in the top vaporware list).
  • RE: TidyFS: Microsoft's simpler distributed file system

    This is very like your writing. Have been descriptive and informative. They thank you for.<a href="http://www.altinoyunlari.net/">altin</a> | <a href="http://www.madenoyunlari.net/">maden</a> | <a href="http://www.ben10oyunlari.eu/">ben ten</a> | <a href="http://www.ben10oyunlari.us/">ben ten</a>
  • RE: TidyFS: Microsoft's simpler distributed file system

    Involved enough [url=http://www.mulberry-eshop.co.uk/]mulberry shop[/url] time so to pull through most of the suggestions, however really savored the say. It all turned out to be pick up Basically [url=http://www.mulberry-eshop.co.uk/]mulberry bags[/url] necessary to my vision and i am particular to each [url=http://www.mulberry-eshop.co.uk/]mulberry bag[/url] just about the commenters greatest here