From FITS to SQL – Loading and Publishing the SDSS Data
- Aniruddha R. Thakar ,
- Alexander S. Szalay ,
- Jim Gray
Astronomical Data Analysis Software and Systems XIII, ASP Conference Series |
For large astronomical databases like the SDSS Science Archive, data loading is potentially the most time-consuming and labor-intensive part of archive operations,an d it is also the most critical: it is the last chance to examine the data before it is published. We attempted to automate this job as much as possible,an d to make it easy to diagnose data and loading errors. We describe the sqlLoader — a distributed workflow system of modules that check, load, validate and publish the data to the databases. The workflow is described by a directed acyclic graph whose nodes are the processing modules. It is designed for parallel loading and is controlled from a web interface (Load Monitor). The validation stage represents a systematic and thorough scrubbing of the data. Finally, the different data products are merged into a set of linked tables that can be efficiently searched with specialized indices and pre-computed joins.