{"id":306122,"date":"2010-07-12T09:00:01","date_gmt":"2010-07-12T16:00:01","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=306122"},"modified":"2016-10-15T19:48:34","modified_gmt":"2016-10-16T02:48:34","slug":"terapixel-project-lots-data-expertise","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/terapixel-project-lots-data-expertise\/","title":{"rendered":"Terapixel Project: Lots of Data, Expertise"},"content":{"rendered":"

By Rob Knies, Managing Editor, Microsoft Research<\/em><\/p>\n

How can you achieve the impossible? Easy\u2014as long as you have the right people and the right tools. The Terapixel (opens in new tab)<\/span><\/a> project from Microsoft Research Redmond (opens in new tab)<\/span><\/a> is proof positive.<\/p>\n

The effort\u2014to create the largest, seamless spherical image ever made of the night sky\u2014has tantalized astronomers for decades, but the sheer volume of data and the challenges in data manipulation have proved frustrating.<\/p>\n

Until now, that is. A small but energetic team from the External Research (opens in new tab)<\/span><\/a> division of Microsoft Research has found a way to use a collection of Microsoft technologies to produce the largest, clearest sky image ever assembled. The project was unveiled July 12, the opening day of Microsoft Research\u2019s 11th annual Faculty Summit (opens in new tab)<\/span><\/a>.<\/p>\n

The image, available on both Microsoft Research\u2019s WorldWide Telescope (opens in new tab)<\/span><\/a> and on Bing Maps (opens in new tab)<\/span><\/a>, surpasses the gargantuan size of 1,000,0002<\/sup> pixels\u2014one terapixel. Dan Fay (opens in new tab)<\/span><\/a>, director of Earth, Energy, and Environment for External Research, says that to view every pixel of the image, you\u2019d need a half-million high-definition televisions. Alternatively, it you were to attempt to print the image, the document would extend the length of a football field.<\/p>\n

And it\u2019s not just big. It\u2019s effective, too. Astronomers who have received an early look at the image have been astounded.<\/p>\n

\u201cIt is absolutely gorgeous,\u201d says Brian McLean, an astronomer at the Space Telescope Science Institute who supplied the original images for the project. \u201cIt is now truly a seamless all-sky image. As someone who has worked on the creation and processing of the [Digitized Sky Survey] for the last 25 years\u2014using it for both science and telescope operations\u2014I can appreciate how the application of the new workflow and high-performance-computing technology has made this possible.\u201d<\/p>\n

Roy Williams of the California Institute of Technology also has worked with the researchers involved in the Terapixel project, and he, too, is impressed with the results.<\/p>\n

\u201cYou\u2019ve done a fabulous job making the new all-sky mosaic. \u2026 For the first time the full glory of the original data is revealed. The new image layer is a real improvement in clarity and beauty, done with smart algorithms and a lot of computing!\u201d<\/p>\n

\"Milky

A view of the Milky Way (left) before Terapixel processing displays tiled artifacts, but the post-processing image is virtually seamless.<\/p><\/div>\n

In May 2008, when the WorldWide Telescope launched, bringing an interactive visualization of the heavens to anybody with a computer, the Digitized Sky Survey images were used. Like many panoramas, however, the stitched collection included artifacts from varying exposure levels across the individual parts of the panorama. There were differences in brightness levels and color saturation, as well, and while you could pan around and zoom into mind-boggling galactic imagery, the panorama was not seamless. The varying exposure levels led to edges that didn\u2019t match up, and the resultant stitchings sometimes looked like a checkerboard.<\/p>\n

That didn\u2019t please Jonathan Fay, a principal software architect who works with his namesake, Dan. (The two are unrelated.)<\/p>\n

\u201cThe original vision of creating this seamless stitch was something I wanted to do myself for a very long time,\u201d Jonathan Fay says. \u201cEventually, it had to be put on the back burner, because it was so much work and we didn\u2019t have the infrastructure to do it. This is something I was really passionate about, but I knew getting the resources to do this would be challenging.\u201d<\/p>\n

In December 2009, however, the two Fays put their heads together.<\/p>\n

\u201cWe were trying to think of things that the ARTS [Advanced Research Tools and Services] team could do, work that is helpful to WorldWide Telescope but somewhat relatively independent,\u201d Jonathan Fay recalls. \u201cI shared with them this passion to finish this and do all our sky panoramas as seamless stitches. It sounded like the Trident workflow was something that could be useful for it. That kind of resounded.\u201d<\/p>\n

It certainly did with Dean Guo, senior program manager for the ARTS team. He, along with lead developer Christophe Poulain, swung into action.<\/p>\n

\"Terapixel

Dean Guo (left) and Christophe Poulain led the Terapixel project from inception to public release in just seven months.<\/p><\/div>\n

\u201cChristophe and I were interested in large-data-set computation and processing,\u201d Guo says. \u201cWe were looking for a showcase study. This one came along, and we decided, \u2018OK, this looks very interesting,\u2019 so we signed up for that. But we didn\u2019t know what we had signed up for.\u201d<\/p>\n

In the end, though, they got the study they were seeking. \u201cThere is massive computation involved,\u201d Guo says.<\/p>\n

\u201cIt\u2019s a huge endeavor,\u201d Dan Fay stipulates. \u201cThe challenge would be how much data and how much space we would have and how to move the data around.\u201d<\/p>\n

The Digitized Sky Survey produces photographic plates of overlapping regions of the sky, using images collected from a pair of telescopes, the Palomar Observatory in San Diego County, Calif., and the Anglo-Australian Telescope in the Australian state of New South Wales. Over a period of 50 years, the two telescopes combined had captured 1,791 pairs of red-light and blue-light images that covered Earth\u2019s entire night sky. Those images were scanned over a 15-year period into a series of plates, compressed and stored as tiles. The result was a collection of 3,120,100 files that, even when 10-fold compression was applied, took up 417 gigabytes of storage.<\/p>\n

How could a small team cope with such voluminous data? Not easily, and probably not at all a few years ago. But that was before Project Trident, a scientific-workflow workbench that debuted in the summer of 2009. Trident makes complex data visually manageable, increasing the scale at which scientific exploration can be conducted.<\/p>\n

\u201cPart of the challenge,\u201d Dan Fay says, \u201cis just the sheer magnitude of the data. It had to be distributed across machines to process, to create an image pipeline. That\u2019s where the benefit of the Trident workflow came in. It enabled Dean and Christophe to rerun the data multiple times so we could improve the quality of the image and the smoothness of it.\u201d<\/p>\n

Guo and Poulain used Trident workflows and DryadLINQ (opens in new tab)<\/span><\/a> to manage code running in parallel across a Windows High Performance (HPC) cluster. All of this was enabled by the researchers\u2019 access to the Microsoft Research Shared Computing Infrastructure, and between faster program execution delivered by the .NET parallel extensions on multicore machines and processing the data on a 64-node cluster, the time it took to run the job shrunk from literally weeks to a few hours.<\/p>\n

Other issues appeared, as well, such as vignetting, which is a darkening of the edges and the corners of each plate. Each plate required correction to contribute to a clear, seamless image, and Dinoj Surendran, data curator for the project, proved invaluable in this effort.<\/p>\n

The stitching and smoothing process also presented challenges, particularly in adapting the telescope data to a spherical model to avoid the distortions near the poles common to two-dimensional maps. To address these concerns, the project team worked with Hugues Hoppe (opens in new tab)<\/span><\/a> and Michael \u201cMisha\u201d Kazhdan.<\/p>\n

For a few years, Hoppe, a principal researcher and manager of the Computer Graphics Group (opens in new tab)<\/span><\/a> at Microsoft Research, had worked on various projects with Kazhdan, a professor in the Department of Computer Science at Johns Hopkins University. In 2008, they wrote the paper Streaming Multigrid for Gradient-Domain Processing on Large Images<\/em> (opens in new tab)<\/span><\/a>, which showed how to assemble photographs efficiently to form seamless gigapixel panoramas. Then in 2010, along with Surendran, they published the paper Distributed Gradient-Domain Processing of Planar and Spherical Images<\/em> (opens in new tab)<\/span><\/a>. Their contribution was to generalize seamless panoramic stitching to the case of spherical images and to make it scalable on even larger data.<\/p>\n

\"Hugues

Hugues Hoppe<\/p><\/div>\n

\u201cWe had a big optimization to do,\u201d Hoppe says, \u201cwith lots of unknowns. The unknowns are the pixel colors. We\u2019re trying to solve for the pixel colors of the stitched montage, such that all the seams will disappear. The constraints are that the neighboring pixels should relate very much like in the original images, and at the seams, where you have different images merging together, the colors should be almost identical so you don\u2019t perceive any discontinuity.<\/p>\n

\u201cIt\u2019s a giant optimization problem, and our approach is about making that efficient. There are many heuristic techniques that people have used before. That helps attenuate the seams, but it doesn\u2019t fix them correctly.\u201d<\/p>\n

Hoppe and Kazhdan turned to Poisson image editing, the result of work performed at Microsoft Research Cambridge (opens in new tab)<\/span><\/a> in 2003.<\/p>\n

\u201cOur work is similar to that,\u201d Hoppe says, \u201cbut it\u2019s about making it efficient in very large domains. Our initial work in 2008 demonstrated results on gigapixel panoramas, and we were impressed that we were getting results there. Later that year, we went up to the terapixel, a thousand times larger.\u201d<\/p>\n

The mapping of the sphere was handled in part by TOAST, for Tessellated Octahedral Adaptive Subdivision Transform, the spherical projection system used in the WorldWide Telescope.<\/p>\n

\u201cThis particular parameterization, it\u2019s all very nice and continuous, except you have to deal with the boundary conditions,\u201d Hoppe says. \u201cWhat\u2019s tricky is that one edge of the planar domain has to be a mirror image of another, and that\u2019s true across all of the boundaries. If you try to solve for a seamless image not respecting these constraints, you would see these discontinuities across four of the edges.\u201d<\/p>\n

Plate data is transformed into a grid of tiles. The grid is divided into columns, called strips, and each is sent to one HPC node for processing in parallel. The data is distributed over a cluster, necessary because of the massive amounts of data involved.<\/p>\n

\u201cThese difficult boundary conditions,\u201d Hoppe explains, \u201care handled by the fact that they\u2019re local in the memory.\u201d<\/p>\n

As the strips are processed, the nodes talk to each other.<\/p>\n

\u201cIf we were just to do the optimization separately,\u201d Hoppe says, \u201cit would create seams, because the optimization results wouldn\u2019t be interacting. We have communication that\u2019s happening between these nodes over the network of the computer cluster.\u201d<\/p>\n

\"constellation

A view of the constellation Sagittarius, before Terapixel processing …<\/p><\/div>\n

Guo, Poulain, and their development agency, Aditi, also found themselves reliant on Jonathan Fay, chief architect for the WorldWide Telescope project and an avid astronomer.<\/p>\n

\u201cWhen we started the project, we knew nothing about astronomy,\u201d Guo smiles. \u201cOur domain knowledge was really limited. But during the project, we learned quite a bit.\u201d<\/p>\n

Fay, of course, was happy to help.<\/p>\n

\u201cThey really dug into this,\u201d he says. \u201cThe more they dug into it, the more they wanted to know. We realized they were going to have to be bootstrapped on astronomy concepts, coordinate spaces, the Digital Sky Survey\u2014all these astronomy terms that I took for granted. I would sketch out the architecture on the whiteboard, and they would work on it and ask me refining questions.<\/p>\n

\"Terapixel

… and after the Terapixel smoothing.<\/p><\/div>\n

\u201cThey put the sweat equity in to make this happen. I just used the astronomical knowledge in my head and my image-processing knowledge to give them some course correction and guidance.\u201d<\/p>\n

That effort includes all the work to provide the ability to zoom into and out of the galaxies the image data represents. The sky-image pyramid includes more than 16 million tiles, each one 256 pixels square. Such scale\u2014and the scientific opportunities it offers\u2014is the key concept behind the recent book The Fourth Paradigm: Data-Intensive Scientific Discovery<\/em> (opens in new tab)<\/span><\/a>.<\/p>\n

In fact, several individuals associated with the Terapixel project suggest that the effort could serve as a model for other work involving big data for scientific exploration.<\/p>\n

\u201cThe parallel extensions, how you use the workflow to manage the clusters \u2026 that\u2019s not just limited to the Terapixel project,\u201d Guo says. \u201cIt could be any large, data-intensive, and computationally intensive project.\u201d<\/p>\n

Jonathan Fay agrees wholeheartedly.<\/p>\n

\u201cThis is stunning evidence of what can be done with Trident and clustered, high-performance computing, and Dryad,\u201d he says. \u201dThere are probably a lot of stories of people who have said, \u2018I\u2019ve got a bunch of data that needs a bunch of processing with a sophisticated pipeline.\u2019 There are a lot of those stories out there, and they\u2019re not just astronomy-image-processing issues.\u201d<\/p>\n

Thus, the Terapixel project remains a cutting-edge research effort.<\/p>\n

\u201cI\u2019ve shown this to a few other astronomers,\u201d Fay says, \u201cand they all think this is absolutely beautiful.<\/p>\n

\u201cI\u2019m really indebted to Dean and Christophe and the whole crew,\u201d he adds. \u201cI know what a phenomenally difficult project this, and while Trident made it a whole easier, it didn\u2019t make it free. They still had to do a lot of work themselves. I really appreciate what they\u2019ve put into it.\u201d<\/p>\n

So, apparently, do others, including Tony Hey (opens in new tab)<\/span><\/a>, corporate vice president of External Research.<\/p>\n

\u201cThis was a tour de force,\u201d Hey says of the Terapixel effort. \u201cCan we make the tour de force into the routine? That\u2019s the challenge now for my team.\u201d<\/p>\n

Something suggests that such a challenge might be met successfully.<\/p>\n

\u201cIn a way, it\u2019s why you come to work at Microsoft,\u201d Poulain says. \u201cYou get to work on very interesting problems using state-of-the-art solutions, and you create a solution that millions of people potentially will be able to look at. That\u2019s pretty awesome.\u201d<\/p>\n

Guo concurs.<\/p>\n

\u201cI can\u2019t imagine we could have solved this problem two years ago,\u201d he says. \u201cWith the new technologies available, a terapixel is no longer a big number, so we are able to solve this problem today. It\u2019s very exciting.<\/p>\n

\u201cWho knows? Maybe five years from now, this project will look small. At Microsoft Research, we are able to do things like this to really showcase what we\u2019re capable of.\u201d<\/p>\n","protected":false},"excerpt":{"rendered":"

By Rob Knies, Managing Editor, Microsoft Research How can you achieve the impossible? Easy\u2014as long as you have the right people and the right tools. The Terapixel project from Microsoft Research Redmond is proof positive. The effort\u2014to create the largest, seamless spherical image ever made of the night sky\u2014has tantalized astronomers for decades, but the […]<\/p>\n","protected":false},"author":39507,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[194474,194480],"tags":[214538,193563,214532,195355,214541,214529,214526,214535,197416,197427,187311],"research-area":[13563,13551],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-306122","post","type-post","status-publish","format-standard","hentry","category-data-visulalization","category-graphics-and-multimedia","tag-all-sky-mosaic","tag-bing-maps","tag-data-manipulation","tag-digitized-sky-survey","tag-interactive-visualization","tag-night-sky","tag-seamless-spherical-image","tag-space-telescope-science-institute","tag-terapixel","tag-the-fourth-paradigm","tag-worldwide-telescope","msr-research-area-data-platform-analytics","msr-research-area-graphics-and-multimedia","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199561,199565],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[],"related-projects":[170492,169537],"related-events":[],"related-researchers":[],"msr_type":"Post","byline":"","formattedDate":"July 12, 2010","formattedExcerpt":"By Rob Knies, Managing Editor, Microsoft Research How can you achieve the impossible? Easy\u2014as long as you have the right people and the right tools. The Terapixel project from Microsoft Research Redmond is proof positive. The effort\u2014to create the largest, seamless spherical image ever made…","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306122"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/39507"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=306122"}],"version-history":[{"count":4,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306122\/revisions"}],"predecessor-version":[{"id":306152,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/306122\/revisions\/306152"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=306122"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=306122"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=306122"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=306122"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=306122"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=306122"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=306122"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=306122"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=306122"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=306122"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=306122"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}