Tornado: A Distributed Spatio-Textual Stream Processing System

  • Ahmed R. Mahmood ,
  • Ahmed M. Aly ,
  • Thamir Qadah ,
  • Elkindi Rezig ,
  • Anas Daghastani ,
  • ,
  • Ahmed S. Abdelhamid ,
  • Mohamed S. Hassan ,
  • Walid G. Aref ,
  • Saleh Basalamah

VLDB Demonstration Track |

The widespread use of location-aware devices together with the increased popularity of micro-blogging applications (e.g., Twitter) led to the creation of large streams of spatio-textual data. In order to serve real-time applications, the processing of these large-scale spatio-textual streams needs to be distributed. However, existing distributed stream processing systems (e.g., Spark and Storm) are not optimized for spatial/textual content. In this demonstration, we introduce Tornado, a distributed in-memory spatio-textual stream processing server that extends Storm. To efficiently process spatiotextual streams, Tornado introduces a spatio-textual indexing layer to the architecture of Storm. The indexing layer is adaptive, i.e., dynamically re-distributes the processing across the system according to changes in the data distribution and/or query workload. In addition to keywords, higher-level textual concepts are identified and are semantically matched against spatio-textual queries. Tornado provides data deduplication and fusion to eliminate redundant textual data. We demonstrate a prototype of Tornado running against real Twitter streams, where the users can register continuous or snapshot spatio-textual queries using a map-assisted query-interface.