바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Topic Model Augmentation and Extension Method using LDA and BERTopic

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2022, v.39 no.3, pp.99-132
https://doi.org/10.3743/KOSIM.2022.39.3.099
SeonWook Kim
Kiduk Yang
  • Downloaded
  • Viewed

Abstract

The purpose of this study is to propose AET (Augmented and Extended Topics), a novel method of synthesizing both LDA and BERTopic results, and to analyze the recently published LIS articles as an experimental approach. To achieve the purpose of this study, 55,442 abstracts from 85 LIS journals within the WoS database, which spans from January 2001 to October 2021, were analyzed. AET first constructs a WORD2VEC-based cosine similarity matrix between LDA and BERTopic results, extracts AT (Augmented Topics) by repeating the matrix reordering and segmentation procedures as long as their semantic relations are still valid, and finally determines ET (Extended Topics) by removing any LDA related residual subtopics from the matrix and ordering the rest of them by (BERTopic topic size rank, Inverse cosine similarity rank). AET, by comparing with the baseline LDA result, shows that AT has effectively concretized the original LDA topic model and ET has discovered new meaningful topics that LDA didn’t. When it comes to the qualitative performance evaluation, AT performs better than LDA while ET shows similar performances except in a few cases.

keywords
library and information science, research trends, topic modeling, matrix reordering, synthesis, LDA, BERT, BERTopic, WORD2VEC, AET
Submission Date
2022-08-13
Revised Date
2022-09-01
Accepted Date
2022-09-16

Journal of the Korean Society for Information Management