Location, Location, Location! Modeling Data Proximity in the Cloud
- Birjodh Tiwana ,
- Mahesh Balakrishnan ,
- Marcos K. Aguilera ,
- Hitesh Ballani ,
- Z. Morley Mao
9th ACM Workshop on Hot Topics in Networks HotNets 2010 |
Published by ACM
Cloud applications have increasingly come to rely on distributed storage systems that hide the complexity of handling network and node failures behind simple, data-centric interfaces (such as PUTs and GETs on key-value pairs). While these interfaces are very easy to use, the application is completely oblivious to the location of its data in the network; as a result, it has no way to optimize the placement of data or computation. In this paper, we propose exposing the network location of data to applications. The primary challenge is that data does not usually exist at a single point in the network; it can be striped, replicated, cached and coded across different locations, in arbitrary ways that vary across storage systems. For example, an item that is synchronously mirrored in both Seattle and London will appear equally far from both locations for writes, but equally close to both locations for reads. Accordingly, we describe Contour, a system that allows applications to query and manipulate the location of data without requiring them to be aware of the physical machines storing the data, the replication protocols used or the underlying network topology