Linguistic Visualization for Fun and Profit

information visualization is an emerging and powerful technique for understanding language. Online text visualization tools are immensely popular online for collaboration, analysis, and entertainment. Through our ethnographic research we have found that visualization is also an important tool for natural language engineers. In a synthesis of information visualization and natural language processing research, my thesis work has approached visualization of language from several perspectives, including visual text analytics and explaining computational linguistic models with visualization. In this talk I will present an overview of several recent projects.

With the ever-growing body of electronic text, opportunities for text visualization in the information retrieval and text analytics processes are expanding. The DocuBurst visualization of document content uses the WordNet ontology as a basis for creating interactive visual document summaries, which can be explored to understand the content and character of very long electronic documents. While DocuBurst provides deep visualizations of a single document at a time, we have also created an analysis system for discovering and exploring the linguistic and content differences amongst the hundreds of thousands of decisions of the US Courts of Appeals. Early feedback from legal scholars has been quite positive.

Computational linguistics often uses statistical models of language – models with uncertainty built in. However, the end-user often only sees the results of the application of these models as a single text string. I will describe our interactive visualization for exposing and explaining the uncertainty in statistical machine translation, as embedded within a cross-lingual chat system. To further aid computational linguistic analysis, we have also created a method for comparing and relating multiple 2D visualizations in a restricted 3D space.

Speaker Details

Christopher Collins is a PhD candidate in computational linguistics, information visualization, and human-computer interaction at the University of Toronto, where he received his M.Sc. in the area of computational linguistics. Collins is the 2006 recipient of the University of Toronto Graduate Award of Excellence. His thesis research investigates interactive visualizations of linguistic data with a focus on convergence and coordination of multiple views of data to provide enhanced insight. He has developed various methods for generating, reading, and comparing visual summaries of text data, for everyday users and data analysts. He works with Gerald Penn and Sheelagh Carpendale (University of Calgary) and recently completed an internship with Martin Wattenberg and Fernanda Viégas of IBM Research.

Date:
Speakers:
Christopher Collins
Affiliation:
Department of Computer Science, University of Toronto