{"id":305990,"date":"2011-01-26T10:00:24","date_gmt":"2011-01-26T18:00:24","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=305990"},"modified":"2016-10-15T15:27:27","modified_gmt":"2016-10-15T22:27:27","slug":"customers-get-dryad-dryadlinq","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/customers-get-dryad-dryadlinq\/","title":{"rendered":"Customers Get Dryad, DryadLINQ"},"content":{"rendered":"
By Douglas Gantenbein, Senior Writer, Microsoft News Center<\/em><\/p>\n Researchers and businesspeople around the world now have at their disposal a new way to perform massive computations over large quantities of unstructured data more quickly and easily than they\u2019ve ever imagined.<\/p>\n The reason: a Microsoft Research-developed computing tool called Dryad (opens in new tab)<\/span><\/a>, a name derived from shy tree deities found in Greek mythology. Dryad and a related programming model called DryadLINQ (opens in new tab)<\/span><\/a> constitute technology that simplifies running complex data-analysis applications across hundreds or even thousands of servers on familiar, widely used Windows software.<\/p>\n After nearly six years of research into Dryad and DryadLINQ\u2014as well as its use in-house on Microsoft projects such as Kinect (opens in new tab)<\/span><\/a> and Bing (opens in new tab)<\/span><\/a>\u2014Dryad and DryadLINQ are entering commercial use. Starting Jan. 26, a technology preview of Dryad and DryadLINQ will be built into the Windows HPC Server 2008 R2 (opens in new tab)<\/span><\/a> high-performance computing line and eventually will be integrated with Microsoft SQL Server (opens in new tab)<\/span><\/a> and Windows Azure (opens in new tab)<\/span><\/a>. HPC Server is designed to give customers tremendous computing power and an easy management experience, all using off-the-shelf hardware.<\/p>\n Michael Isard, a Microsoft Research Silicon Valley principal researcher instrumental in launching the Dryad project, says the new technology is an excellent example of how Microsoft views computing.<\/p>\n \u201cThis is an opportunity to democratize large-scale, data-intensive computing,\u201d he says. \u201cIn areas such as customer-relationship management, business intelligence, planning, and infrastructure\u2014all those tasks where companies now have access to a vast amount of data\u2014Dryad and DryadLINQ can make sense of that data.\u201d<\/p>\n The Dryad project consists of two key components. The Dryad tool itself provides reliable computing across thousands of servers. DryadLINQ, built on Microsoft\u2019s .NET Language Integrated Query (opens in new tab)<\/span><\/a> (LINQ), enables developers to write their applications in a SQL-like query language, using familiar programming tools such as Microsoft Visual Studio (opens in new tab)<\/span><\/a>. Most programmers will work only with DryadLINQ; once they have launched their application into the cloud, Dryad will do the rest, invisibly.<\/p>\n A third piece, the Distributed Storage Catalog (DSC), is a distributed file system built for Dryad. It manages the data that Dryad is processing, keeping it stored reliably and safely with user-configurable redundancy. The DSC also keeps the data close to the servers processing it, so time is not wasted transmitting the data to a server.<\/p>\n Dryad and DryadLINQ make it easier for programmers to take advantage of the power of parallel computing, in which rows of servers or multicore processors within a single machine tackle a single computing problem. Such computing is extremely powerful, especially with so-called \u201cunstructured\u201d data such as information on buying habits that a retailer might collect from tens of thousands of customers but that has not been tagged or annotated, in contrast to structured data found, for instance, in a SQL database.<\/p>\n It is difficult, though, to harness the power afforded by parallel computing. Most programmers are more familiar with writing sequential programs, in which Action A is followed by Action B, then Action C. It is challenging to think and program in parallel.<\/p>\n While DryadLINQ enables developers to write their applications in a query language using Visual Studio, Dryad breaks up the program and assigns it across clusters of servers or processors. In effect, Dryad acts as a computing traffic cop, sending data down potentially millions of computing pathways. It helps make sure that when one piece of data is modified, other servers don\u2019t also change that data. It balances the computing load between many computers, and it re-routes computing traffic if an error or communications problem temporarily takes one or even several servers offline.<\/p>\n That removes a huge burden from programmers and lets them focus on the problem they are trying to solve, not how the computers will act in parallel.<\/p>\n \u201cWe want programmers to be able to write their programs without having to think about things like fault tolerance [a byproduct of parallel computing\u2019s complexity],\u201d says Yuan Yu, a principal researcher at Microsoft Research Silicon Valley who led the creation of the DryadLINQ component.<\/p>\nHow Dryad Works<\/h2>\n