Topic Word Selection for Topics Modeled with Latent Dirichlet Allocation  
Author Laura Kölbl

 

Co-Author(s) Michael Grottke

 

Abstract With topic modeling methods, such as Latent Dirichlet Allocation (LDA), we can find topics in large text collections. To efficiently employ this information, there is a need for a method that automatically analyzes the topics with respect to their usefulness for applications like the detection of new innovations. This paper presents a novel method to automatically evaluate topics produced by LDA. The new approach puts the focus on finding topics with topic words that are not only coherent, but also specific. By using the documents associated with each word to calculate background topics, a baseline can be set for each topic word that helps assess whether its context fits the topic well. Experiments indicate that the resulting topics are more manageable in terms of their interpretability. Moreover, we show that the approach can be used to detect weak signals.

 

Keywords Text Mining, Topic Modeling, Weak Signals, Topic Coherence
   
    Article #:  DSIS19-17
 
Proceedings of ISSAT International Conference on Data Science & Intelligent Systems
August 1-3, 2019 - Las Vegas, NV, U.S.A.