바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

  • P-ISSN1013-0799
  • E-ISSN2586-2073

텍스트 마이닝 기법을 이용한 연관용어 선정에 관한 실험적 연구

An Experimental Study on Selecting Association Terms Using Text Mining Techniques

정보관리학회지, (P)1013-0799; (E)2586-2073
2006, v.23 no.3, pp.147-165
https://doi.org/10.3743/KOSIM.2006.23.3.147
김수연 (연세대학교)
정영미 (연세대학교)

  • 다운로드 수
  • 조회수

초록

이 연구에서는 전체 문헌집단으로부터 초기 질의어에 대한 연관용어 선정 시 사용할 수 있는 최적의 기법을 찾기 위해 연관규칙 마이닝과 용어 클러스터링 기법을 이용하여 연관용어 선정 실험을 수행하였다. 연관규칙 마이닝 기법에서는 Apriori 알고리즘을 사용하였으며, 용어 클러스터링 기법에서는 연관성 척도로 GSS 계수, 자카드계수, 코사인계수, 소칼 & 스니스 5, 상호정보량을 사용하였다. 성능평가 척도로는 연관용어 정확률과 연관용어 일치율을 사용하였으며, 실험결과 Apriori 알고리즘과 GSS 계수가 가장 좋은 성능을 나타냈다.

Abstract

In this study, experiments for selection of association terms were conducted in order to discover the optimum method in selecting additional terms that are related to an initial query term. Association term sets were generated by using support, confidence, and lift measures of the Apriori algorithm, and also by using the similarity measures such as GSS, Jaccard coefficient, cosine coefficient, and Sokal & Sneath 5, and mutual information. In performance evaluation of term selection methods, precision of association terms as well as the overlap ratio of association terms and relevant documents' indexing terms were used. It was found that Apriori algorithm and GSS achieved the highest level of performances.

참고문헌

1

박우창. (2003). 데이터마이닝: 개념 및 기법. , -.

2

이재윤. (2004). 연관성 척도의 빈도수준 선호경향에 대한 연구. 정보관리학회지, 21(4), 281-294.

3

정영미. (2005). 정보검색연구. , -.

4

Mining Association Rules between Sets of Items in Large Database Proceeding of the ACM SIGMOD International Conference on Management of Data. , 207-216.

5

(r.1994). Fast Algorithms for Mining Association Rules Proceeding of the 20th International Conference on Very Large Databases. , -.

6

(1997). Exploiting Clustering and Phrases for Context-Based Information Retrieval Proceeding of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. , 314-323.

7

(1999). Modern Information Retrieval. , -.

8

(j.1994). The Effect of Adding Relevance Information in a Relevance Feedback Environment Proceeding of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 292-300.. , -.

9

(2004). Optimization of Some Factors Affecting the Performance of Query Expansion. Information Processing and Management. 0(6), 891-917.

10

(1996). Advances in Knowledge Discovery and Data Mining. MIT Press.. , -.

11

(2000). Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization Proc. of ECDL-00. 4th European Conference on Research and Advanced Technology for Digital Libraries. , 59-68.

12

(ed.thesmartretrievalsystemexperimentsinautomaticdocumentprocessing.337-354.). New Experiments in Relevance Feedback. , -.

13

(1999). A Comparison of Collocation-Based Similarity Measures in Query Expansion. 35(1), -.

14

(1). Association in Document Retrieval Systems. , 27-38.

15

(1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. , -.

16

(p.1991). The Limitation of Term Co-Occurrence Data for Query Expansion in Document Retrieval Systems Journal of the American Society for Information Science. , 378-383.

17

(h.p.1993). Proceedings of the 16th Annual International ACM SIGIR conference on Research and Development in Information Retrieval. 160-169.. , -.

18

(ed.thesmartretrievalsystemexperimentsinautomaticdocumentprocessing.313-323.). Relevance Feedback in Information Retrieval. , -.

19

(1999). Novel Query Expansion Technique using Apriori Algorithm. , -.

20

(1999). Text Mining. 34, 385-419.

21

(2000). Mining Term Rules for Automatic Global Query Expansion: Methodology and Preliminary Results. , 366-373.

22

(1996.). Query Expansion using Local and Global Document Analysis. , 4-11.

정보관리학회지