Improving Unsupervised Query Segmentation using Parts-of-Speech Sequence Information
- Rishiraj Saha Roy ,
- Yogarshi Vyas ,
- Niloy Ganguly ,
- Monojit Choudhury
Proceedings of the 37th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (SIGIR '14) |
Published by ACM - Association for Computing Machinery
We present a generic method for augmenting unsupervised query segmentation by incorporating Parts-of-Speech (POS) sequence information to detect meaningful but rare n-grams. Our initial experiments with an existing English POS tagger employing two different POS tagsets and an unsupervised POS induction technique specifically adapted for queries show that POS information can significantly improve query segmentation performance in all these cases.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.