{"id":306524,"date":"2009-07-13T09:00:34","date_gmt":"2009-07-13T16:00:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=306524"},"modified":"2016-10-17T14:14:17","modified_gmt":"2016-10-17T21:14:17","slug":"project-trident-navigating-sea-data","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/project-trident-navigating-sea-data\/","title":{"rendered":"Project Trident: Navigating a Sea of Data"},"content":{"rendered":"
By Rob Knies, Managing Editor, Microsoft Research<\/em><\/p>\n How deep is the ocean? Geologically, the answer is straightforward: almost seven miles. This we know from a series of surveys, beginning in the 19th century, of the depth of the Mariana Trench, near Guam in the North Pacific, a boundary between two tectonic plates that is understood to be the deepest point in the world\u2019s oceans.<\/p>\n When it comes to understanding what transpires in the ocean, however, the question becomes immensely more challenging. The complexities of ocean dynamics remain a profound mystery. Water is effectively opaque to electromagnetic radiation, meaning that the floor of the oceans, which drive biological and climatic systems with fundamental implications for terrestrial life, have not been mapped as thoroughly as the surfaces of some of our fellow planets in the solar system. The oceans, covering 70 percent of the globe, represent Earth\u2019s vast, last physical frontier.<\/p>\n Roger Barga is helping to unlock those secrets.<\/p>\n Barga, principal architect for the External Research<\/a> division of Microsoft Research, heads Project Trident: A Scientific Workflow Workbench, an effort to make complex data visually manageable, enabling science to be conducted at a large scale.<\/p>\n Working with researchers at the University of Washington, the Monterey Bay Aquarium Research Institute, and others, Barga and his colleagues on External Research\u2019s Advanced Research Tools and Services group have developed a mechanism for expanding the Windows Workflow Foundation (opens in new tab)<\/span><\/a>, based on the Microsoft .NET Framework, to combine visualization and workflow services to enable better management, evaluation, and interaction with complex data sets.<\/p>\n Project Trident was presented on July 13 during the 10th annual Microsoft Research Faculty Summit<\/a>. The workbench is available as a research development kit on DVD; future releases will be available on CodePlex (opens in new tab)<\/span><\/a>.<\/p>\n \u201cScientific workflow has become an integral part of most e-research projects,\u201d Barga says. \u201cIt allows researchers to capture the process by which they go from raw data to actual final results. They are able to articulate these in workflow schedules. They can share them, they can annotate them, they can edit them very easily.<\/p>\n \u201cA repertoire of these workflows becomes a workbench, by which scientists can author new experiments and run old ones. It also is a platform to which you can attach services like provenance [in this case, the origin of a specific set of information or data]. It becomes this wonderful environment in which researchers can do their research, capture the results, and share their knowledge. That\u2019s what scientific workflow is all about.\u201d<\/p>\n Project Trident, which includes fault tolerance and the ability to recover from failures, has the potential to make research more efficient. Scientists spend a lot of time validating and replicating their experiments, and the workbench can capture every step of an experiment and enable others to check or rerun it by setting different parameters.<\/p>\n True to its namesake in classical mythology, Project Trident\u2019s first implementation is to assist in the data management for a seafloor-based research network called the Ocean Observatories Institute (OOI), formerly known as NEPTUNE.<\/p>\n The OOI, a $400 million effort sponsored by the National Science Foundation, will produce a massive amount of data from thousands of ocean-based sensors off the coast of the Pacific Northwest. The first Regional Cabled Observatory will consist of more than 1,500 kilometers of fiber-optic cable on the seafloor of the Juan de Fuca plate. Affixed to the cable will be thousands of chemical, geological, and biological sensors transmitting continuous streaming data for oceanographic analysis.<\/p>\n Plans for a Regional Cabled Observatory on the Juan de Fuca plate enabled in part by Project Trident.<\/p><\/div>\n The expectation is that this audacious undertaking will transform oceanography from a data-poor discipline to one overflowing with data. Armed with such heretofore inaccessible information, scientists will be able to examine issues such as the ocean\u2019s ability to absorb greenhouse gases and to detect seafloor stresses that could spawn earthquakes and tsunamis.<\/p>\n \u201cIt will carry power and bandwidth to the ocean,\u201d Barga says, \u201cand will allow scientists to study long-term ocean processes. I think it\u2019s going to be a rich area for researchers to invest in and Microsoft to be a part of. It\u2019s very compelling.\u201d<\/p>\n Barga, who has been interested in custom scientific workflow solutions throughout his career, got involved with Project Trident in 2006 It should come as little surprise that his initial nudge in the direction that became Project Trident came from computer-science visionary Jim Gray<\/a>.<\/p>\n \u201cI had been with the group for only six weeks,\u201d Barga recalls. \u201cI wanted to engage in a project with external collaborators, and I reached out to Jim Gray, who consulted with Tony [Hey, corporate vice president of External Research].<\/p>\n \u201cI asked Jim about what he thought would be a good opportunity to engage the scientific community. He introduced me to the oceanographers and computer scientists working on a project called NEPTUNE. He introduced me to a graduate student named Keith Grochow.\u201d<\/p>\n Grochow was a doctoral student at the University of Washington studying visualization techniques to help oceanographers. He was being supervised by Ed Lazowska (opens in new tab)<\/span><\/a> and Mark Stoermer of the university faculty. Barga met them, too. But it was Gray who put Barga on the Project Trident path.<\/p>\n \u201cJim described, during the course of an hour-long phone conversation, his idea behind an oceanographer\u2019s workbench that would consist of sensors, data streaming in off the NEPTUNE array, and these beautiful visualizations of what was going on in the ocean appearing on the oceanographer\u2019s desktop, wherever they were in the world,\u201d Barga says. \u201cHe noted that we needed to be able to transform raw data coming in off the sensors in the ocean, invoking computational models and producing visualizations. He noted that workflow was exactly what was needed, and he knew my passion in the area.<\/p>\n \u201cHence, we started off building a specific scientific workflow solution for the oceanographers, for NEPTUNE. That project delivered its first prototype in three months, and we validated that we can support scientific workflow on Windows Workflow.\u201d<\/p>\n Along the way, Barga and associates became aware that their work on Project Trident was extensible to other scientific endeavors.<\/p>\n \u201cWe realized we had an incredible amount to offer other groups,\u201d Barga says, \u201cSeveral groups acknowledged they were spending too much time supporting their platform.\u201d<\/p>\n Before long, Barga found himself collaborating with astronomers from Johns Hopkins University to develop an astronomer\u2019s workbench to support the Panoramic Survey Telescope and Rapid Response System (opens in new tab)<\/span><\/a> (Pan-STARRS), an effort to combine relatively small mirrors with large digital cameras to produce an economical system that can observe the entire available sky several times each month. The goal of Pan-STARRS, which is being developed at the University of Hawaii\u2019s Institute for Astronomy, is to discover and characterize Earth-approaching objects, such as asteroids and comets, that could pose a danger to Earth.<\/p>\n Such work was made possible by ensuring that the work on Project Trident could be generalized to other scientific domains.<\/p>\n \u201cWe were able to look back on all the existing workflow systems and build upon the best design ideas,\u201d Barga says. \u201cThat allowed us to move forward very fast. In addition, we chose two or three different problems to work on. Not only were we working on the oceanographic one, we looked at how we could support astronomy with Pan-STARRS, a very different domain, a very different set of requirements.<\/p>\n \u201cIf you design a system with two or three different customers in mind, you generalize very well. You come up with a very general architecture. One of the challenges we had to overcome was to not specialize on just one domain, or it would be too specialized a solution. Pick two or three, and balance the requirements so you build a general, extensible framework. We think we\u2019ve done that.\u201d<\/p>\n Project Trident also exploits the powerful graphics capabilities of modern computers.<\/p>\n \u201cThe gaming industry has created this amazing graphics engine available on every PC, yet the resource has been largely ignored by the scientific community,\u201d says Grochow, whose doctoral thesis will be based on the NEPTUNE project. He adds that the same graphical tools that enable gamers to battle monsters or to fly virtual aircraft can be used instead of cumbersome text and formula entries to achieve many scientific tasks.<\/p>\n Today’s computers offer the graphics capabilities to provide stunning undersea visualizations such as this one from the University of Washington’s Collaborative Observatory Visualization Environment project.<\/p><\/div>\n The University of Washington\u2019s Collaborative Observatory Visualization Environment (opens in new tab)<\/span><\/a> (COVE) was running out of funding when Microsoft Research got involved. Microsoft supplied financial and technical support to enable COVE to thrive, says Stoermer, director of the university\u2019s Center for Environmental Visualization.<\/p>\n \u201cCOVE really is about taking a gaming perspective to research,\u201d he says. \u201cAnd in the long run, we see this as applicable well beyond oceanography.\u201d<\/p>\n John Delaney (opens in new tab)<\/span><\/a>, professor of oceanography at the University of Washington, and Deb Kelley, an associate professor of marine geology and geophysics at the university, also have been key collaborators on the project, as has Jim Bellingham and his team at the Monterey Bay Aquarium Research Institute.<\/p>\n \u201cThey have given us very valuable feedback,\u201d Barga says, \u201con the role workflow will play in their environment.\u201d<\/p>\n In computer science, the concept of workflow refers to detailed code specifications for running and coordinating a sequence of actions. The workflow can be simple and linear, or it can be a conditional, many-branched series with complex feedback loops. Project Trident enables sophisticated analysis in which scientists can write a desired sequence of computational steps and data flow ranging from data capture from sensors or computer simulations to data cleaning and alignment to the final visualization of the analysis. Scientists can explore data in real time; compose, run, and catalog experiments; and add custom workflows and data transformation for others. But the concept required some convincing.<\/p>\n \u201cIt\u2019s been an interesting journey,\u201d Barga smiles. \u201cWhen we started this a year and a half ago, in the oceanographic community the response was, \u2018What\u2019s workflow?\u2019 It took a long dialogue and a series of demonstrations.<\/p>\n \u201cFast forward 16 months, and people are keen to embrace a workflow system. They\u2019re actually thinking about their problems as workflows and repeating them back to us: \u2018I have a workflow. Let me explain it to you.\u2019 Their awareness has been raised significantly in the oceanographic community.\u201d<\/p>\n The deluge of scientific data not only requires tools to enable data management, but also to use the vast computing resources of data centers. And another Microsoft Research technology, DryadLINQ<\/a>, can help in that regard.<\/p>\n \u201cResearchers need to have automated pipelines to convert that data into useful research objects,\u201d Barga explains. \u201cThat\u2019s where tools like workflow and Trident come into play. Then researchers have a very large cluster, but no means by which to efficiently program against it. That\u2019s where DryadLINQ comes into play. They can take a sequential program and schedule that thing over 3,000 nodes in a cluster and get very high distributed throughput.<\/p>\n \u201cWe envision a world where the two actually work together. All that data may invoke a very large computation room, may require very detailed analysis or cleaning. If we use DryadLINQ over a cluster, we may be able to do data-parallel programming and bring the result back into the workflow.\u201d<\/p>\n
