Topic-Network based Topic Shift Detection on Twitter

본 연구는 높은 접근성과 간결성으로 인해 방대한 양의 텍스트를 생산하는 트위터 데이터를 분석하여 토픽의 변화 시점 및 패턴을 파악하였다. 먼저 특정 상품명에 관한 키워드를 추출한 후, 동시출현단어분석(Co-word Analysis)을 이용하여 노드와 에지를 통해 토픽과 관련 키워드를 직관적으로 파악 가능한 네트워크로 표현하였다. 이후 네트워크 분석 결과를 검증하기 위해 출현빈도 기반의 시계열 분석과 LDA 토픽 모델링을 실시하였다. 또한 트위터 상의 토픽 변화와 언론 기사 검색결과를 비교한 결과, 트위터는 언론 뉴스에 즉각적으로 반응하며 부정적 이슈를 빠르게 확산시키는 것을 확인하였다. 이를 통해 기업은 대중의 부정적 의견을 신속하게 파악하고 이에 대한 즉각적인 의사결정 및 대응을 위한 도구로 본 연구방법을 활용할 수 있을 것으로 기대된다.

keywords: LDA, latent Dirichlet allocation, twitter, topic detection, co-word analysis, network-based analysis, time-series graph, 트위터, 토픽 추적, 동시출현단어분석, 네트워크 기반 분석, 시계열 그래프

Abstract

This study identified topic shifts and patterns over time by analyzing an enormous amount of Twitter data whose characteristics are high accessibility and briefness. First, we extracted keywords for a certain product and used them for representing the topic network allows for intuitive understanding of keywords associated with topics by nodes and edges by co-word analysis. We conducted temporal analysis of term co-occurrence as well as topic modeling to examine the results of network analysis. In addition, the results of comparing topic shifts on Twitter with the corresponding retrieval results from newspapers confirm that Twitter makes immediate responses to news media and spreads the negative issues out quickly. Our findings may suggest that companies utilize the proposed technique to identify public’s negative opinions as quickly as possible and to apply for the timely decision making and effective responses to their customers.

keywords: LDA, latent Dirichlet allocation, twitter, topic detection, co-word analysis, network-based analysis, time-series graph, 트위터, 토픽 추적, 동시출현단어분석, 네트워크 기반 분석, 시계열 그래프

참고문헌

김성훈. (2011). 트위터 게시물을 이용한 공통 관심사를 지닌 사용자그룹 발견 (129-131). 한국지능시스템학회 학술발표 논문집.

김은미. (2011). 뉴스 미디어로서의 트위터: 뉴스 의제와 뉴스에 대한 대화를 중심으로. 한국언론학보, 55(6), 152-180.

송종석. (2011). 상품평 극성 분류를 위한 특징별 서술어 긍정/부정 사전 자동 구축. 정보과학회논문지 : 소프트웨어 및 응용, 38(3), 157-168.

이원태. (2011). 소셜미디어 유력자의 네트워크 특성: 한국의 트위터 공동체를 중심으로. 언론정보연구, 48(2), 44-79.

전선규. (1996). 불만족한 소비자의 구매 후 행동. 마케팅, 30(10), 22-26.

정혜란. (2010). 국내 트위터 유저 분석을 위한 예비연구 “익스트림 헤비 유저”의 트위터 로그를 중심으로. 한국HCI학회 논문지, 5(1), 37-43.

최돈정. (2011). 마이크로블로그를 통한 그래프 기반의 토픽 추출에 관한 연구. 한국지능시스템학회 논문지, 21(5), 564-568.

하용호. (2012). 내용기반 트윗 클러스터링을 통한 트렌드 분석 (210-212). 한국정보과학회 학술발표논문집.

황유선. (2010). 트위터에서의 의견 지도력과 트위터 이용패턴: 이용동기, 트윗 이용패턴, 그리고 유형별 사례분석. 한국방송학보, 24(6), 365-404.

10.

Asur, S.. (2010). Predicting the future with social media. http://arxiv.org/abs/1003.5699.

11.

Bermingham, A.. (2011). On using twitter to monitor political sentiment and predict election results (-). Sentiment Analysis Where AI Meets Psychology (SAAIP) Workshop at the International Joint Conference for Natural Language Processing (IJCNLP).

12.

Blei, D. M.. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

13.

Chen, Q.. (2010). Tweets mining using WIKIPEDIA and impurity cluster measurement (141-143). IEEE ISI 2010.

14.

Davidiv, D.. (2010). Enhanced sentiment learning using twitter hashtags and smileys (241-249). Proceedings of the 23rd International Conference on Computational Linguistics.

15.

Esuli, A.. (2006). Determining term subjectivity and term orientation for opinion mining (193-200). Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL-06).

16.

Go, A.. (2009). Twitter sentiment classification using distant supervision. Stanford University.

17.

Java, A.. (2007). Why we twitter: Understanding microblogging usage and communities (56-65). Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis (WebKDD/SNA-KDD ’07).

18.

Jiang, L.. (2011). Target-dependent twitter sentiment classification (151-160). Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies.

19.

Mimno, D.. (2008). Topic models conditioned on arbitrary features with dirichlet-multinomial regression (-). Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI ’08).

20.

O'Connor, B.. (2010). From tweets to polls: Linking text sentiment to public opinion time series (122-129). Proceedings of International AAAI Conference on Weblogs and Social Media.

21.

Pennacchiotti, M.. (2011). A machine learning approach to twitter user classification (281-288). Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media.

22.

Sakaki, T.. (2011). Tweet trend analysis in an emergency situation (-). Proceedings of the Special Workshop on Internet and Disasters (SWID ’11).

23.

Strapparava, C.. (2004). Pattern abstraction and term similarity for word sense disambiguation: IRST at Senseval-3 (229-234). Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3).

24.

Tumasjan, A.. (2010). Predicting elections with twitter: What 140 characters reveal about political sentiment (178-185). Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media.

25.

Wang, X.. (2011). Topic sentiment analysis in twitter: A graph-based hashtag sentiment classification approach (1031-1040). Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM ’11).

바로가기메뉴

논문 상세

Vol.30 No.1

트위터 데이터를 이용한 네트워크 기반 토픽 변화 추적 연구

Topic-Network based Topic Shift Detection on Twitter

초록

Abstract

참고문헌

정보관리학회지