Microsoft takes a step toward commercializing its 'Dryad' distributed computing technologies

Microsoft takes a step toward commercializing its 'Dryad' distributed computing technologies

Summary: Microsoft has started external developer testing of a number of interrelated parallel/distributed technologies for Windows Server that are part of the codename "Dryad" family.

SHARE:

Microsoft has started external developer testing of a number of interrelated parallel/distributed technologies for Windows Server that are part of the codename "Dryad" family.

According to a December 17 blog post on the Windows HPC (High Performance Computing) Team Blog, Microsoft is making available to testers via its Connect test site the first Community Technology Preview (CTP) test builds of its Dryad, DSC and DryadLINQ technologies.

Dryad is Microsoft's competitor to Google MapReduce and Apache Hadoop. In the early phase of its existence, Dryad was a Microsoft Research project dedicated to developing ways to write parallel and distributed programs that can scale from small clusters to large datacenters. There’s a DryadLINQ compiler and runtime that is related to the project. Microsoft released builds of Dryad and DryadLINQ code to academics for noncommercial use in the summer 2009. Microsoft moved Dryad from its research to its Technical Computing Group this year.

According to a presentation from August, the team's plan was to deliver a first CTP build of the stack in November 2010 and to release a final version of it running on Windows Server High Performance Computing servers by 2011.

This initial preview is intended for "developers who are exploring data-intensive computing," according to the Softies. The prerequisite for the CTP is HPC Pack 2008 R2 Enterprise-based cluster, with Service Pack 1 installed.

As I noted in a previous blog post, there are a number of interesting components that comprise Dryad, including a new distributed filesystem (codenamed "TidyFS"), a set of related data-management tools (codenamed Nectar") and a scheduler for distributed clusters (codenamed "Quincy").

Topics: Cloud, Data Centers, Microsoft, Operating Systems, Software, Windows

About

Mary Jo has covered the tech industry for 30 years for a variety of publications and Web sites, and is a frequent guest on radio, TV and podcasts, speaking about all things Microsoft-related. She is the author of Microsoft 2.0: How Microsoft plans to stay relevant in the post-Gates era (John Wiley & Sons, 2008).

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

6 comments
Log in or register to join the discussion
  • RE: Microsoft takes a step toward commercializing its 'Dryad' distributed computing technologies

    You know, I once had this alternate universe moment where I briefly hoped that MS would choose to join in pushing open standards and interfaces in this area, competing on the basis of implementation, rather than tilting at the vertical-stack lock-in integration approach ...

    Oh well, so much for that ...
    daboochmeister
    • RE: Microsoft takes a step toward commercializing its 'Dryad' distributed computing technologies

      @daboochmeister yeah shame on a commercial company creating their own solution to a problem!
      jessiethe3rd
  • What's DSC stands for?

    Do you have any idea, Mary? I'vs found in sample code some thing like it realated to a distributed fs. data services...
    phankhanhhung@...
    • I've found something interesting

      "DSC is the Distributed Storage Catalog that provides data management functionality, including replication and load balancing. DSC and NTFS together provide the storage capability underlying Dryad and DryadLINQ"
      ...

      "Dryad uses this information to determine where to run jobs ? this system preferentially moves the computation to the data, rather than moving the data to the computation. DSC also provides file replication and load balancing functionality. Dryad understands data locality for data registered with DSC, and it preferentially schedules computations to the servers holding data that the computation needs."

      "A Dryad job typically starts with a collection of persistent input data, such as a set of log files, and returns a processed version of that data to the application. The input data is typically partitioned into reasonable-sized pieces, which have been distributed across the cluster before the job starts. The partitioned input data can be created and distributed in any of several ways, including:
      By using a DryadLINQ application to read data from a share, and then use hash or range partitioning to put data on compute nodes and write the metadata to the DSC database.
      By manually partitioning the input data into a set of files, copying each file to a compute node, and then creating a DSC stream to contain the names and locations of the partitions"
      phankhanhhung@...
  • removed

    removed
    phankhanhhung@...
  • RE: Microsoft takes a step toward commercializing its 'Dryad' distributed computing technologies

    Your website web site web [url=http://www.reebok-jersey-shops.com/cheap-green-bay-packers-jerseys-on-sale/cheap-clay-matthews-jerseys-on-sale]clay matthews jersey[/url] site is normally quite attractive. My corporation [url=http://www.reebok-jersey-shops.com/cheap-green-bay-packers-jerseys-on-sale/cheap-aaron-rodgers-jerseys-on-sale]aaron rodgers jersey[/url] is sensitive at every someone of the locations you [url=http://www.reebok-jersey-shops.com/]reebok jersey[/url] happen to be berbagi boost necessities!
    makrejktt55-24353620343749869561798500353356