Making Every Bit Count in Wide-area Analytics
- Ariel Rabkin ,
- Matvey Arye ,
- Siddhartha Sen ,
- Vivek Pai ,
- Michael Freedman
Proc. 14th Conference on Hot Topics in Operating Systems (HotOS) |
6 pages
Many data sets, such as system logs, are generated from
widely distributed locations. Current distributed systems
often discard this data because they lack the ability to
backhaul it efficiently, or to do anything meaningful with
it at the distributed sites. This leads to lost functionality,
efficiency, and business opportunities. The problem with
traditional backhaul approaches is that they are slow and
costly, and require analysts to define the data they are
interested in up-front. We propose a new architecture that
stores data at the edge (i.e., near where it is generated) and
supports rich real-time and historical queries on this data,
while adjusting data quality to cope with the vagaries of
wide-area bandwidth. In essence, this design transforms
a distributed data collection system into a distributed data
analysis system, where decisions about collection do not
preclude decisions about analysis.