International Society of Science and Applied Technologies |
|
Topic Word Selection for Topics Modeled with Latent Dirichlet Allocation | ||||
Author | Laura Kölbl
|
|||
Co-Author(s) | Michael Grottke
|
|||
Abstract | With topic modeling methods, such as Latent Dirichlet Allocation (LDA), we can find topics in large text collections. To efficiently employ this information, there is a need for a method that automatically analyzes the topics with respect to their usefulness for applications like the detection of new innovations. This paper presents a novel method to automatically evaluate topics produced by LDA. The new approach puts the focus on finding topics with topic words that are not only coherent, but also specific. By using the documents associated with each word to calculate background topics, a baseline can be set for each topic word that helps assess whether its context fits the topic well. Experiments indicate that the resulting topics are more manageable in terms of their interpretability. Moreover, we show that the approach can be used to detect weak signals.
|
|||
Keywords | Text Mining, Topic Modeling, Weak Signals, Topic Coherence | |||
Article #: DSIS19-17 |
August 1-3, 2019 - Las Vegas, NV, U.S.A. |