Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs

Abstract: We develop a fast algorithm for the Admixture of Poisson MRFs (APM) topic model and propose a novel metric to directly evaluate this model. The APM topic model recently introduced by Inouye et al. (2014) is the first topic model that allows for word dependencies within each topic unlike in previous topic models like LDA that assume independence between words within a topic. Research in both the semantic coherence of a topic models (Mimno et al. 2011, Newman et al. 2010) and measures of model fitness (Mimno & Blei 2011) provide strong support that explicitly modeling word dependencies—as in APM—could be both semantically meaningful and essential for appropriately modeling real text data. Though APM shows significant promise for providing a better topic model, APM has a high computational complexity because $O(p^2)$ parameters must be estimated where $p$ is the number of words (Inouye et al. could only provide results for datasets with $p=200$). In light of this, we develop a parallel alternating Newton-like algorithm for training the APM model that can handle $p=10^4$ as an important step towards scaling to large datasets. In addition, Inouye et al. only provided tentative and inconclusive results on the utility of APM. Thus, motivated by simple intuitions and previous evaluations of topic models, we propose a novel evaluation metric based on human evocation scores between word pairs (i.e. how much one word brings to mind” another word (Boyd-Graber et al. 2006)). We provide compelling quantitative and qualitative results on the BNC corpus that demonstrate the superiority of APM over previous topic models for identifying semantically meaningful word dependencies.

Download: pdf, poster, code, software

Citation

Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs (pdf, poster, software, code)
D. Inouye, P. Ravikumar, I. Dhillon.
In Neural Information Processing Systems (NIPS), pp. 3158-3166, December 2014.

Bibtex:
@inproceedings{inouye2014capturing, author = "David I. Inouye AND Pradeep Ravikumar AND Inderjit S. Dhillon", title = "Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs", booktitle = "Neural Information Processing Systems (NIPS)", page = "3158–3166", year = "2014", month = "dec", abstract = "We develop a fast algorithm for the Admixture of Poisson MRFs (APM) topic model and propose a novel metric to directly evaluate this model. The APM topic model recently introduced by Inouye et al. (2014) is the first topic model that allows for word dependencies within each topic unlike in previous topic models like LDA that assume independence between words within a topic. Research in both the semantic coherence of a topic models (Mimno et al. 2011, Newman et al. 2010) and measures of model fitness (Mimno & Blei 2011) provide strong support that explicitly modeling word dependencies—as in APM—could be both semantically meaningful and essential for appropriately modeling real text data. Though APM shows significant promise for providing a better topic model, APM has a high computational complexity because $O(p^2)$ parameters must be estimated where $p$ is the number of words (Inouye et al. could only provide results for datasets with $p=200$). In light of this, we develop a parallel alternating Newton-like algorithm for training the APM model that can handle $p=10^4$ as an important step towards scaling to large datasets. In addition, Inouye et al. only provided tentative and inconclusive results on the utility of APM. Thus, motivated by simple intuitions and previous evaluations of topic models, we propose a novel evaluation metric based on human evocation scores between word pairs (i.e. how much one word brings to mind{"} another word (Boyd-Graber et al. 2006)). We provide compelling quantitative and qualitative results on the BNC corpus that demonstrate the superiority of APM over previous topic models for identifying semantically meaningful word dependencies." }

Software

Associated Projects

Topic Models with Word Dependencies

Center for Big Data Analytics

Capturing Semantically Meaningful Word Dependencies with an Admixture of Poisson MRFs

David Inouye, Pradeep Ravikumar, Inderjit Dhillon

Download: pdf, poster, code, software

Citation

Software

Associated Projects