How Hierarchical Topics Evolve in Large Text Corpora

IEEE Transactions on Visualization and Computer Graphics (IEEE InfoVis'14) |

Published by IEEE

Publication

overview2Using a sequence of topic trees to organize documents is a popular way to represent hierarchical and evolving topics in text corpora. However, following evolving topics in the context of topic trees remains difficult for users. To address this issue, we present an interactive visual text analysis approach to allow users to progressively explore and analyze the complex evolutionary patterns of hierarchical topics. The key idea behind our approach is to exploit a tree cut to approximate each tree and allow users to interactively modify the tree cuts based on their interests. In particular, we propose an incremental evolutionary tree cut algorithm with the goal of balancing 1) the fitness of each tree cut and the smoothness between adjacent tree cuts; 2) the historical and new information related to user interests. A time-based visualization is designed to illustrate the evolving topics over time. To preserve the mental map, we develop a stable layout algorithm. As a result, our approach can quickly guide users to progressively gain profound insights into evolving hierarchical topics. We evaluate the effectiveness of the proposed method on Amazon’s Mechanical Turk and real-world news data. The results show that users are able to successfully analyze evolving topics in text data.