Tornado: A Distributed Spatio-Textual Stream Processing System

Ahmed R. Mahmood; Ahmed M. Aly; Thamir Qadah; Elkindi Rezig; Anas Daghastani; Amgad Madkour; Ahmed S. Abdelhamid; Mohamed S. Hassan; Walid G. Aref; Saleh Basalamah

Tornado: A Distributed Spatio-Textual Stream Processing System

Ahmed R. Mahmood ,
Ahmed M. Aly ,
Thamir Qadah ,
Elkindi Rezig ,
Anas Daghastani ,
Amgad Madkour ,
Ahmed S. Abdelhamid ,
Mohamed S. Hassan ,
Walid G. Aref ,
Saleh Basalamah

VLDB Demonstration Track | May 2015

Download BibTex

The widespread use of location-aware devices together with the increased popularity of micro-blogging applications (e.g., Twitter) led to the creation of large streams of spatio-textual data. In order to serve real-time applications, the processing of these large-scale spatio-textual streams needs to be distributed. However, existing distributed stream processing systems (e.g., Spark and Storm) are not optimized for spatial/textual content. In this demonstration, we introduce Tornado, a distributed in-memory spatio-textual stream processing server that extends Storm. To efficiently process spatiotextual streams, Tornado introduces a spatio-textual indexing layer to the architecture of Storm. The indexing layer is adaptive, i.e., dynamically re-distributes the processing across the system according to changes in the data distribution and/or query workload. In addition to keywords, higher-level textual concepts are identified and are semantically matched against spatio-textual queries. Tornado provides data deduplication and fusion to eliminate redundant textual data. We demonstrate a prototype of Tornado running against real Twitter streams, where the users can register continuous or snapshot spatio-textual queries using a map-assisted query-interface.