바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2005, v.22 no.2, pp.125-145
https://doi.org/10.3743/KOSIM.2005.22.2.125


  • Downloaded
  • Viewed

Abstract

This paper presents a semantic vector space retrieval model incorporating a word sense disambiguation algorithm in an attempt to improve retrieval effectiveness. Nine Korean homonyms are selected for the sense disambiguation and retrieval experiments. The total of approximately 120,000 news articles comprise the raw test collection and 18 queries including homonyms as query words are used for the retrieval experiments. A Naive Bayes classifier and EM algorithm representing supervised and unsupervised learning algorithms respectively are used for the disambiguation process. The Naive Bayes classifier achieved 92% disambiguation accuracy, while the clustering performance of the EM algorithm is 67% on the average. The retrieval effectiveness of the semantic vector space model incorporating the Naive Bayes classifier showed 39.6% precision achieving about 7.4% improvement. However, the retrieval effectiveness of the EM algorithm-based semantic retrieval is 3% lower than the baseline retrieval without disambiguation. It is worth noting that the performances of disambiguation and retrieval depend on the distribution patterns of homonyms to be disambiguated as well as the characteristics of queries.

keywords
information retrieval, word sense disambiguation, Naive Bayes classifier, EM algorithm, clustering, retrieval effectiveness, 정보검색, 중의성 해소, 나이브 베이즈 분류기, EM 알고리즘, 클러스터링, information retrieval, word sense disambiguation, Naive Bayes classifier, EM algorithm, clustering, retrieval effectiveness

Reference

1.

(2001). 사전의 뜻풀이말에서 추출한 의미정보에 기반한 동형이의어 중의성 해결 시스템 소프트웨어 및 응용. , 688-698.

2.

(2001). “A Corpus-based Approach to Com- parative Evaluation of Statistical Term Association Measures Journal of the American Society for Infor- mation Science and Technology. , 283-296.

3.

(1992). “A Method for Disambigu- ating Word Sense in a Large Corpus. , 415-439.

4.

(1992a). “Estimating Upper and Lower Bounds on the Performance of Word Sense Disambiguation Programs Proceedings of the 30th Annual Meeting of the Association for Computational Linguistics. , 249-256.

5.

(1992b). Proceedings of the Speech and Natural Language Workshop. , 233-237.

6.

(1993). “A method for disambiguating word senses in a large corpus. , 415-439.

7.

(1998). the state of the art. , 1-40.

8.

(2002). Natural Language Processing for Online Applications. , -.

9.

(2005). “An analysis of web searching by European AlltheWeb. , 361-381.

10.

(2000). a study and analysis of user queries on the web. , 207-227.

11.

(1992). ACM Transactions on Information Retrieval Systems. , 115-141.

12.

(1999). “Corpus-based method for unsupervised word sense disambigu- ation Proceedings of the Workshop on Machine Learning in Human Language Technology Advanced Cou- rse on Artificial Intelligence. , 267-273.

13.

(1999). Foundations of Statistical Natural Language Processing. , -.

14.

(1994). Proceedings of the 17th international ACM SIGIR. , 49-57.

15.

(2000). “Retrieving with good. , 49-69.

16.

(1995). “Information retrieval based on word sense Proceedings of the Fourth Annual Symposium on Document Analysis and Information Retrieval. , 161-175.

17.

(2003). the Case for Combinations for Knowledge Sources. , -.

18.

(2003). “Word sense disambiguation in information retrieval revisited Pro- ceedings of the 26th ACM SIGIR. , 159-166.

19.

(1999). Proceedings of the Seventh Text Retrieval Conference. , -7.

20.

(1993). “Using WordNet to disambiguate word senses for text retrieval Proceedings of SIGIR '93. , 171-180.

21.

(1995189-196). Annual Meeting of the ACL Archive Proceedings of the 33rd conference on Association for Computational Linguistics. , -.

22.

(1995189-196). Annual Meeting of the ACL Archive Proceedings of the 33rd conference on Association for Computational Linguistics. , -.

Journal of the Korean Society for Information Management