바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

A Study on the Frequency Level Preference Tendency of Association Measures

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2004, v.21 no.4, pp.281-294
https://doi.org/10.3743/KOSIM.2004.21.4.281

  • Downloaded
  • Viewed

Abstract

Association measures are applied to various applications, including information retrieval and data mining. Each association measure is subject to a close examination to its tendency to prefer high or low frequency level because it has a significant impact on the performance of applications. This paper examines the frequency level preference(FLP) tendency of some popular association measures using artificially generated cooccurrence data, and evaluates the results. After that, a method of how to adjust the FLP tendency of major association measures such as cosine coefficient is proposed. This method is tested on the cooccurrence-based query expansion in information retrieval and the result can be regarded as promising the usefulness of the method. Based on these results of analysis and experiment, implications for related disciplines are identified.

keywords
연관성척도, 동시출현분석, 정보검색, 데이터마이닝Association Measures, Cooccurrence Analysis, Information Retrieval, Data Mining

Reference

1.

김지영. (2000). 한국어 테스트 컬렉션 HANTEC의 확장 및 보완 (210-215). 제12회 한글 및 한국어 정보치리 학술대회 논문집.

2.

사공철. (2003). 정보학 사전:서울: 문헌정보처리연구회.

3.

이재윤. (2003). 상호정보량의 정규화에 대한 연구. 문헌정보학회지, 37(4), 177-198.

4.

정영미. (1987). 정보검색론:서울: 정음사.

5.

Chung, Young Mee. (2001). A corpus-based approach to comparative evaluation of statistical term association measures. Journal of the American Society for Information Science and Technology, 352(4), 283-296.

6.

Chung, Young Mee. (2004). Optimization of some factors affecting the performance of query expansion. Information Processing and Management, 40(6), 891-917.

7.

Forman, George. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289-1305.

8.

Galavotti, L. (2000). Experiments on the use of feature selection and negative evidence in automated text categorization (59-68). Proceedings of ECDL- 00, 4th European Conference on Research and Advanced Technology for Digital Libraries(Lisbon, Portugal, 2000).

9.

Gower, J. C. (1985). Measures of similarity, dissimilarity, and distance(In Encyclopedia of Statistical Sciences, Vol. 5).

10.

Meyer, A. (2004). Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L). Genetics and Molecular Biology, 27(1), 83-91.

11.

Mladeni'c, D. (1998). Feature subset selection in text-learning (95-100). In Proceedings of the Tenth European Conference on Machine Learning(Chemnitz, Germany, 1998).

12.

Salton, G. (1973). On the specification of term values in automatic indexing. Journal of Documentation, 29(4), 351-372.

13.

van Rijsbergen, C. J. (1979). Information Retrieval. 2nd ed.:London: Butterworths.

14.

Yang, Yiming. (1997). A comparative study on feature selection in text categorization (412-420). Proceedings of the 14th International Conference on Machine Learning.

Journal of the Korean Society for Information Management