Tornado: A Distributed Spatio-Textual Stream Processing System
- Ahmed R. Mahmood ,
- Ahmed M. Aly ,
- Thamir Qadah ,
- Elkindi Rezig ,
- Anas Daghastani ,
- Amgad Madkour ,
- Ahmed S. Abdelhamid ,
- Mohamed S. Hassan ,
- Walid G. Aref ,
- Saleh Basalamah
VLDB Demonstration Track |
The widespread use of location-aware devices together with the increased popularity of micro-blogging applications (e.g., Twitter) led to the creation of large streams of spatio-textual data. In order to serve real-time applications, the processing of these large-scale spatio-textual streams needs to be distributed. However, existing distributed stream processing systems (e.g., Spark and Storm) are not optimized for spatial/textual content. In this demonstration, we introduce Tornado, a distributed in-memory spatio-textual stream processing server that extends Storm. To efficiently process spatiotextual streams, Tornado introduces a spatio-textual indexing layer to the architecture of Storm. The indexing layer is adaptive, i.e., dynamically re-distributes the processing across the system according to changes in the data distribution and/or query workload. In addition to keywords, higher-level textual concepts are identified and are semantically matched against spatio-textual queries. Tornado provides data deduplication and fusion to eliminate redundant textual data. We demonstrate a prototype of Tornado running against real Twitter streams, where the users can register continuous or snapshot spatio-textual queries using a map-assisted query-interface.