{"id":306524,"date":"2009-07-13T09:00:34","date_gmt":"2009-07-13T16:00:34","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=306524"},"modified":"2016-10-17T14:14:17","modified_gmt":"2016-10-17T21:14:17","slug":"project-trident-navigating-sea-data","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/project-trident-navigating-sea-data\/","title":{"rendered":"Project Trident: Navigating a Sea of Data"},"content":{"rendered":"

By Rob Knies, Managing Editor, Microsoft Research<\/em><\/p>\n

How deep is the ocean? Geologically, the answer is straightforward: almost seven miles. This we know from a series of surveys, beginning in the 19th century, of the depth of the Mariana Trench, near Guam in the North Pacific, a boundary between two tectonic plates that is understood to be the deepest point in the world\u2019s oceans.<\/p>\n

When it comes to understanding what transpires in the ocean, however, the question becomes immensely more challenging. The complexities of ocean dynamics remain a profound mystery. Water is effectively opaque to electromagnetic radiation, meaning that the floor of the oceans, which drive biological and climatic systems with fundamental implications for terrestrial life, have not been mapped as thoroughly as the surfaces of some of our fellow planets in the solar system. The oceans, covering 70 percent of the globe, represent Earth\u2019s vast, last physical frontier.<\/p>\n

Roger Barga is helping to unlock those secrets.<\/p>\n

Barga, principal architect for the External Research<\/a> division of Microsoft Research, heads Project Trident: A Scientific Workflow Workbench, an effort to make complex data visually manageable, enabling science to be conducted at a large scale.<\/p>\n

Working with researchers at the University of Washington, the Monterey Bay Aquarium Research Institute, and others, Barga and his colleagues on External Research\u2019s Advanced Research Tools and Services group have developed a mechanism for expanding the Windows Workflow Foundation<\/a>, based on the Microsoft .NET Framework, to combine visualization and workflow services to enable better management, evaluation, and interaction with complex data sets.<\/p>\n

Project Trident was presented on July 13 during the 10th annual Microsoft Research Faculty Summit<\/a>. The workbench is available as a research development kit on DVD; future releases will be available on CodePlex<\/a>.<\/p>\n

\u201cScientific workflow has become an integral part of most e-research projects,\u201d Barga says. \u201cIt allows researchers to capture the process by which they go from raw data to actual final results. They are able to articulate these in workflow schedules. They can share them, they can annotate them, they can edit them very easily.<\/p>\n

\u201cA repertoire of these workflows becomes a workbench, by which scientists can author new experiments and run old ones. It also is a platform to which you can attach services like provenance [in this case, the origin of a specific set of information or data]. It becomes this wonderful environment in which researchers can do their research, capture the results, and share their knowledge. That\u2019s what scientific workflow is all about.\u201d<\/p>\n

Project Trident, which includes fault tolerance and the ability to recover from failures, has the potential to make research more efficient. Scientists spend a lot of time validating and replicating their experiments, and the workbench can capture every step of an experiment and enable others to check or rerun it by setting different parameters.<\/p>\n

True to its namesake in classical mythology, Project Trident\u2019s first implementation is to assist in the data management for a seafloor-based research network called the Ocean Observatories Institute (OOI), formerly known as NEPTUNE.<\/p>\n

The OOI, a $400 million effort sponsored by the National Science Foundation, will produce a massive amount of data from thousands of ocean-based sensors off the coast of the Pacific Northwest. The first Regional Cabled Observatory will consist of more than 1,500 kilometers of fiber-optic cable on the seafloor of the Juan de Fuca plate. Affixed to the cable will be thousands of chemical, geological, and biological sensors transmitting continuous streaming data for oceanographic analysis.<\/p>\n

\"Regional

Plans for a Regional Cabled Observatory on the Juan de Fuca plate enabled in part by Project Trident.<\/p><\/div>\n

The expectation is that this audacious undertaking will transform oceanography from a data-poor discipline to one overflowing with data. Armed with such heretofore inaccessible information, scientists will be able to examine issues such as the ocean\u2019s ability to absorb greenhouse gases and to detect seafloor stresses that could spawn earthquakes and tsunamis.<\/p>\n

\u201cIt will carry power and bandwidth to the ocean,\u201d Barga says, \u201cand will allow scientists to study long-term ocean processes. I think it\u2019s going to be a rich area for researchers to invest in and Microsoft to be a part of. It\u2019s very compelling.\u201d<\/p>\n

Barga, who has been interested in custom scientific workflow solutions throughout his career, got involved with Project Trident in 2006 It should come as little surprise that his initial nudge in the direction that became Project Trident came from computer-science visionary Jim Gray<\/a>.<\/p>\n

\u201cI had been with the group for only six weeks,\u201d Barga recalls. \u201cI wanted to engage in a project with external collaborators, and I reached out to Jim Gray, who consulted with Tony [Hey, corporate vice president of External Research].<\/p>\n

\u201cI asked Jim about what he thought would be a good opportunity to engage the scientific community. He introduced me to the oceanographers and computer scientists working on a project called NEPTUNE. He introduced me to a graduate student named Keith Grochow.\u201d<\/p>\n

Grochow was a doctoral student at the University of Washington studying visualization techniques to help oceanographers. He was being supervised by Ed Lazowska<\/a> and Mark Stoermer of the university faculty. Barga met them, too. But it was Gray who put Barga on the Project Trident path.<\/p>\n

\u201cJim described, during the course of an hour-long phone conversation, his idea behind an oceanographer\u2019s workbench that would consist of sensors, data streaming in off the NEPTUNE array, and these beautiful visualizations of what was going on in the ocean appearing on the oceanographer\u2019s desktop, wherever they were in the world,\u201d Barga says. \u201cHe noted that we needed to be able to transform raw data coming in off the sensors in the ocean, invoking computational models and producing visualizations. He noted that workflow was exactly what was needed, and he knew my passion in the area.<\/p>\n

\u201cHence, we started off building a specific scientific workflow solution for the oceanographers, for NEPTUNE. That project delivered its first prototype in three months, and we validated that we can support scientific workflow on Windows Workflow.\u201d<\/p>\n

Along the way, Barga and associates became aware that their work on Project Trident was extensible to other scientific endeavors.<\/p>\n

\u201cWe realized we had an incredible amount to offer other groups,\u201d Barga says, \u201cSeveral groups acknowledged they were spending too much time supporting their platform.\u201d<\/p>\n

Before long, Barga found himself collaborating with astronomers from Johns Hopkins University to develop an astronomer\u2019s workbench to support the Panoramic Survey Telescope and Rapid Response System<\/a> (Pan-STARRS), an effort to combine relatively small mirrors with large digital cameras to produce an economical system that can observe the entire available sky several times each month. The goal of Pan-STARRS, which is being developed at the University of Hawaii\u2019s Institute for Astronomy, is to discover and characterize Earth-approaching objects, such as asteroids and comets, that could pose a danger to Earth.<\/p>\n

Such work was made possible by ensuring that the work on Project Trident could be generalized to other scientific domains.<\/p>\n

\u201cWe were able to look back on all the existing workflow systems and build upon the best design ideas,\u201d Barga says. \u201cThat allowed us to move forward very fast. In addition, we chose two or three different problems to work on. Not only were we working on the oceanographic one, we looked at how we could support astronomy with Pan-STARRS, a very different domain, a very different set of requirements.<\/p>\n

\u201cIf you design a system with two or three different customers in mind, you generalize very well. You come up with a very general architecture. One of the challenges we had to overcome was to not specialize on just one domain, or it would be too specialized a solution. Pick two or three, and balance the requirements so you build a general, extensible framework. We think we\u2019ve done that.\u201d<\/p>\n

Project Trident also exploits the powerful graphics capabilities of modern computers.<\/p>\n

\u201cThe gaming industry has created this amazing graphics engine available on every PC, yet the resource has been largely ignored by the scientific community,\u201d says Grochow, whose doctoral thesis will be based on the NEPTUNE project. He adds that the same graphical tools that enable gamers to battle monsters or to fly virtual aircraft can be used instead of cumbersome text and formula entries to achieve many scientific tasks.<\/p>\n

\"Today's

Today’s computers offer the graphics capabilities to provide stunning undersea visualizations such as this one from the University of Washington’s Collaborative Observatory Visualization Environment project.<\/p><\/div>\n

The University of Washington\u2019s Collaborative Observatory Visualization Environment<\/a> (COVE) was running out of funding when Microsoft Research got involved. Microsoft supplied financial and technical support to enable COVE to thrive, says Stoermer, director of the university\u2019s Center for Environmental Visualization.<\/p>\n

\u201cCOVE really is about taking a gaming perspective to research,\u201d he says. \u201cAnd in the long run, we see this as applicable well beyond oceanography.\u201d<\/p>\n

John Delaney<\/a>, professor of oceanography at the University of Washington, and Deb Kelley, an associate professor of marine geology and geophysics at the university, also have been key collaborators on the project, as has Jim Bellingham and his team at the Monterey Bay Aquarium Research Institute.<\/p>\n

\u201cThey have given us very valuable feedback,\u201d Barga says, \u201con the role workflow will play in their environment.\u201d<\/p>\n

In computer science, the concept of workflow refers to detailed code specifications for running and coordinating a sequence of actions. The workflow can be simple and linear, or it can be a conditional, many-branched series with complex feedback loops. Project Trident enables sophisticated analysis in which scientists can write a desired sequence of computational steps and data flow ranging from data capture from sensors or computer simulations to data cleaning and alignment to the final visualization of the analysis. Scientists can explore data in real time; compose, run, and catalog experiments; and add custom workflows and data transformation for others. But the concept required some convincing.<\/p>\n

\u201cIt\u2019s been an interesting journey,\u201d Barga smiles. \u201cWhen we started this a year and a half ago, in the oceanographic community the response was, \u2018What\u2019s workflow?\u2019 It took a long dialogue and a series of demonstrations.<\/p>\n

\u201cFast forward 16 months, and people are keen to embrace a workflow system. They\u2019re actually thinking about their problems as workflows and repeating them back to us: \u2018I have a workflow. Let me explain it to you.\u2019 Their awareness has been raised significantly in the oceanographic community.\u201d<\/p>\n

The deluge of scientific data not only requires tools to enable data management, but also to use the vast computing resources of data centers. And another Microsoft Research technology, DryadLINQ<\/a>, can help in that regard.<\/p>\n

\u201cResearchers need to have automated pipelines to convert that data into useful research objects,\u201d Barga explains. \u201cThat\u2019s where tools like workflow and Trident come into play. Then researchers have a very large cluster, but no means by which to efficiently program against it. That\u2019s where DryadLINQ comes into play. They can take a sequential program and schedule that thing over 3,000 nodes in a cluster and get very high distributed throughput.<\/p>\n

\u201cWe envision a world where the two actually work together. All that data may invoke a very large computation room, may require very detailed analysis or cleaning. If we use DryadLINQ over a cluster, we may be able to do data-parallel programming and bring the result back into the workflow.\u201d<\/p>\n

A group of researchers at Microsoft Research Silicon Valley have been working on the Dryad<\/a> and DryadLINQ projects for more than four years. The goal of their research is to make distributed data-parallel computing easily accessible to all developers. Developers write programs using LINQ and .NET as if they were programming for a single computer. Dryad and DryadLINQ automatically take care of the hard problems of parallelization and distributed execution on clusters consisting of thousands of computers. Dryad and DryadLINQ have been used on a wide variety of applications, including relational queries, large-scale log mining, Web-graph analysis, and machine learning. The tools are available as no-cost downloads to academic researchers and scientists.<\/p>\n

The objectives of Project Trident are to enable researchers to dig into large-scale projects and to analyze impenetrably complex problems.<\/p>\n

\u201cBeyond showing that Windows Workflow could be used as an underlying engine,\u201d Barga says, \u201cthe goal was to explore new services that we could build on top of workflow, such as automatic provenance capture. Project Trident has a feature that allows the researcher to generate a result\u2014an image or a gif or a chart\u2014and to export it to a Word document. Not only do you get the image that you want to put into the document, but you get all the inputs required to rerun that workflow at a later date, should somebody want to reproduce the research. Project Trident has mechanisms by which it versions workflows, versions the data, records all this information, and exports enough information to make it possible to come back and rerun everything.<\/p>\n

\u201cThis is a new capability that you don\u2019t see in other systems.\u201d<\/p>\n

In addition to the practicality of such technology, there also is a specific research objective.<\/p>\n

\u201cAs we move into more complex architectures\u2014multicore, programming data centers\u2014workflows are a wonderful abstraction for specifying the exact work that needs to be done and the order in which it needs to be done, a very natural way for users to express their intent. It also leaves this beautiful artifact called a schedule, which is an XML representation of these constraints. You could analyze it and then figure out how to schedule all that work on a multicore machine. You might have an eight- or 12-core machine sitting behind you, but on your next iteration of that same workflow, you may have a 30-node cluster. The scheduler can look at that XML representation of the work and do the scheduling.\u201d<\/p>\n

The project also supports runtime adaptation: If an underwater earthquake occurs, the technology can pause running its normal workflows and initiate higher-priority workflows to enable the processing and visualization of data from the event. And Project Trident can help in cost estimation of required time and system resources.<\/p>\n

\"Project

Roger Barga (center) and the Project Trident team: (back row, left to right) Dean Guo, Richard Roberts, David Koop, Matt Valerio, Nitin Gautan, and (front row, left to right), Nelson Araujo, Jared Jackson, Satya Sahoo, and Eran Chinthaka.<\/p><\/div>\n

\u201cThe team has done a very nice job of building a tool,\u201d Barga says, crediting Hey for supporting the project and Dan Fay<\/a>, director of Earth, Energy, and Environment for External Research, for providing financial support. Jared Jackson worked on the initial prototype and acted as developmental lead for the project. Nelson Araujo was the software architect for Project Trident\u2019s key features, and Dean Guo managed the dev team. The Aditi partner firm tested the code, built it, and made it robust.<\/p>\n

Now, with the code having been released, Barga and team will focus on building a community around Project Trident.<\/p>\n

\u201cWe have been invited to take it on a handful of major science studies,\u201d he says. \u201cWe would love to engage these researchers with it, help them write the workflows to carry out these projects. If we do two or three, the body of workflows and activities is only going to continue to grow. We\u2019re not going to try to push it into any new areas right now. We\u2019re just going to go deeper in oceanography and try to build a community of users around it.\u201d<\/p>\n

And that, he hopes, will lead to Project Trident being incorporated into the tool kit available to 21st-century scientists.<\/p>\n

\u201cFuture scientific-workflow systems will not be built from the ground up,\u201d Barga says, \u201cbut instead will leverage commercial workflow engines, and researchers will only build what they need. We\u2019ll see more sustainability for scientific-workflow systems, which will validate the thesis we started with. You\u2019re going to see conventional workflow systems start to take the requirements and features we built in Project Trident\u2014data flow, programming, provenance, versioning.\u201d<\/p>\n

Such developments are eagerly awaited.<\/p>\n

\u201cWe\u2019d like to see entire communities of oceanographers sharing workflows,\u201d Barga concludes.\u201d The NEPTUNE Canadian team came to visit about a month ago, and that\u2019s what they were most excited about, thinking about deploying it in their environments. They could share workflows internationally, from Canada down to the U.S. and other installations around the world.<\/p>\n

\u201cThat would be fantastic.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"

By Rob Knies, Managing Editor, Microsoft Research How deep is the ocean? Geologically, the answer is straightforward: almost seven miles. This we know from a series of surveys, beginning in the 19th century, of the depth of the Mariana Trench, near Guam in the North Pacific, a boundary between two tectonic plates that is understood […]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[194475,194478],"tags":[186553,215018,196376,196615,215033,196700,215021,215030,215006,196911,215024,215027,215003],"research-area":[13563,198583],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-306524","post","type-post","status-publish","format-standard","hentry","category-database-data-analytics-platforms","category-ecology-and-environment","tag-fault-tolerance","tag-mariana-trench","tag-microsoft-net-framework","tag-national-science-foundation","tag-neptune","tag-ocean","tag-ocean-dynamics","tag-ocean-observatories-institute","tag-pan-starrs","tag-project-trident","tag-scientific-workflow","tag-seafloor-based-research-network","tag-windows-workflow-foundation","msr-research-area-data-platform-analytics","msr-research-area-ecology-environment","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[169537,169536],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"July 13, 2009","formattedExcerpt":"By Rob Knies, Managing Editor, Microsoft Research How deep is the ocean? Geologically, the answer is straightforward: almost seven miles. This we know from a series of surveys, beginning in the 19th century, of the depth of the Mariana Trench, near Guam in the North…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306524"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=306524"}],"version-history":[{"count":3,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306524\/revisions"}],"predecessor-version":[{"id":306974,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306524\/revisions\/306974"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=306524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=306524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=306524"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=306524"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=306524"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=306524"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=306524"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=306524"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=306524"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=306524"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=306524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}