{"id":15102,"date":"2016-03-10T08:30:59","date_gmt":"2016-03-10T16:30:59","guid":{"rendered":"https:\/\/blogs.technet.microsoft.com\/dataplatforminsider\/?p=15102"},"modified":"2024-01-22T22:50:25","modified_gmt":"2024-01-23T06:50:25","slug":"mapping-the-universe-with-sql-server","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/sql-server\/blog\/2016\/03\/10\/mapping-the-universe-with-sql-server\/","title":{"rendered":"Mapping the Universe with SQL Server"},"content":{"rendered":"

This blog post was co-authored by Joseph Sirosh, Corporate Vice President, and Rimma V. Nehme, Principal Software Engineer, at the Data Group at Microsoft.<\/em><\/p>\n

\u00a0<\/p>\n

\"SzalayFigure 1: Visible objects of the Sloan Digital Sky Survey (SDSS) DR7 dataset.<\/span><\/em><\/p>\n

Over the last 15 years a database helped revolutionize an entire field of science. Astronomical discovery and sophisticated analyses of properties of the aggregate universe was turbocharged by a vast public mapping effort of the sky, called the Sloan Digital Sky Survey, whose data was served in a public database built with Microsoft SQL Server. This was the first in the field and opened up an entirely new window into the Universe.<\/p>\n

\"TheFigure 2: The Fourth Paradigm: Data-Intensive Scientific Discovery book dedicated to Jim Gray.<\/em><\/span><\/p>\n

Now scientists in every field, from astronomy to zoology, are recognizing that the rate of accumulation of data in their fields are greatly outstripping the rate of accumulation of interpretation, i.e. the rate at which the scientific community can assimilate data into an interpretive framework. And there\u2019s widespread recognition that powerful scientific discoveries lie hidden in such massive data. The Fourth Paradigm of scientific discovery, driven by novel techniques for analyzing massive data, is a driving force in science like never before.<\/p>\n

Sloan Digital Sky Survey: The Cosmic Genome Project<\/h2>\n

It all started in the early 90\u2019s when Dr. Alex Szalay together with the late Dr. Jim Gray took on a daring endeavor to build what could be called the first \u201cDataScope\u201d \u2013 an efficient data intensive computing infrastructure for astronomers called the Sloan Digital Sky Survey (SDSS)<\/a> using Microsoft SQL Server<\/a> as the back-end database.<\/p>\n

\"SuperFigure 3: Jim Gray, Alex Szalay and other astronomers at Super Computing 2003<\/em><\/span><\/p>\n

SDSS had a bold goal \u2013 to create a map of the universe in a database for exploration by all. It is often referred to as the Cosmic Genome Project. A dedicated 2.5-m-diameter telescope in New Mexico used a 120-megapixel camera to image more than one-quarter of the entire night sky, 1.5 square degrees of sky at a time, about eight times the area of the full Moon, both inside and outside of the Milky Way, and helped create a three-dimensional (3D) map of millions of galaxies, quasars and stars.<\/p>\n

The SDSS maps sparked a revolution in the way astronomy is practiced. No longer did scientists have to wait months for access to a telescope to learn about the night sky; instead, entire research projects could be accomplished by querying the online database. The SDSS made its entire data set available through SkyServer<\/a> database – an online portal for public use,\u00a0 and invited volunteer contributions to scienti\ufb01c research. Prior to SDSS, only the leading scientists and astronomers had telescopes and instruments to collect data for serious research, with most others largely excluded from direct and active engagement with astronomy. Now, with access to the visual data that SkyServer offers, anyone with Internet access could explore the universe with data just as the top scientists do.<\/p>\n

\"SDSSIV_MilkyWay\"Figure 4: SDSS-IV can view the whole Milky Way<\/em><\/span><\/p>\n

SkyServer’s architecture was fairly simple to start with: a front-end IIS web server accepted HTTP requests processed by JavaScript Active Server Pages (ASP). These scripts used Active Data Objects (ADO) to query the backend Microsoft SQL Server database. SQL Server returned record sets that the JavaScript formatted into pages. The website was about 40,000 lines of code and was originally built by two people as a spare-time activity.<\/p>\n

Why Microsoft SQL Server?<\/h2>\n

While building applications to study the correlation properties of galaxies, Szalay and his team have discovered that many of the patterns in their statistical analysis involved tasks that were much better performed inside the database engine than outside, on flat files. The Microsoft SQL Server gave them high-speed sequential search of complex predicates using multiple CPUs, multiple disks and large main memories. It also had sophisticated indexing and data joining algorithms far outperforming hand-written programs against flat files. Many of the multi-day batch files were replaced with database queries that ran in minutes thanks to the sophisticated query optimizer.<\/p>\n

Impact<\/h2>\n

\"Sloan<\/a>
\nFigure 5: Dr. Jim Gray in front of the Sloan telescope in Apache Point, NM<\/span><\/em><\/p>\n

The most recent version of the database has a 15TB queryable public dataset, with about 150TB additional raw and calibrated files. A recent scan of the logs showed more than 1.6 billion web hits in the past 14 years and more than four million distinct IP addresses accessing the site. The total number of professional astronomers worldwide is only about 15,000. Furthermore, the multiuser collaborative environment in SDSS called CasJobs<\/a> which allows users to launch extensive analyses has more than 6,820 registered users – almost half of the professional astronomy community.<\/p>\n

SDSS has been successful in generating new scienti\ufb01c discoveries, including the measurements of thousands of asteroids, maps of the complicated merger history of the outer Milky Way, and the \ufb01rst detection of the baryon acoustic peak – a measurement of how structure formed from ultra-low frequency standing sound waves in the early universe. These surveys have produced data to support 5,800 papers with more than 245,000 citations.\u00a0 This has made SDSS one of the highest impact projects in the field of astronomy.<\/p>\n

SkyServer data<\/h2>\n

The amount of astronomical data in SkyServer is truly unprecedented. When the SDSS began in 1998, astronomers had data for less than 200,000 galaxies. Within five years after SDSS began, SkyServer had data on 200 million galaxies in the database. Today, the SDSS data exceeds 150 terabytes, covering more than 220 million galaxies and 260 million stars. The images alone include 2.5 trillion pixels of original raw data. SkyServer allows users to search for stars at a given position in the sky, or they can search for galaxies brighter than a certain limit. Users can also enter queries to the database in SQL directly, which allows more \ufb02exible and sophisticated searches.<\/p>\n

Examples of queries users can ask in SkyServer:<\/p>\n