바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2009, v.26 no.3, pp.261-278
https://doi.org/10.3743/KOSIM.2009.26.3.261

  • Downloaded
  • Viewed

Abstract

Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

keywords
WordNet, text categorization, semantics, feature selection, feature expansion, 자질선정, 의미기반, 문서범주화, WordNet, text categorization, semantics, feature selection, feature expansion

Reference

1.

이재윤. (2005). 자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구. 한국문헌정보학회지, 39(2), 123-146.

2.

Barak, L. (2009). Text categorization from category name via lexical reference (33-36). Proceedings of NAACL HLT 2009: Short Papers.

3.

Bloehdorn, S. (2004). Boosting for text classification with semantic features (-). Proceedings of the MSW 2004 Workshop at the 10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining.

4.

Brank, J. (2002). Interaction of feature selection methods and linear classification models (-). Proceedings of the ICML Workshop on Text Learning.

5.

de Buenaga Rodriguez, M. (1997). Using WordNet to complement training information in text categorization (150-157). In the Proceedings of the 2nd International Conference on Recent Advances in Natural Language Processing.

6.

Chen, J. (2009). Feature selection for text classification with Naive Bayes. Expert Systems with Applications, 36, 5432-5435.

7.

Fellbaum, C. (1998). WordNet: An Electronic Lexical Database:MIT Press.

8.

Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning, 3, 1289-1305.

9.

John, G. H. (1994). Irrelevant features and the subset selection problem (121-129). Proceedings of the 11th International Conference on Machine Learning.

10.

Kehagias, A. A comparison of word-and sense-based text categorization using several classification algorithms.

11.

Lewis, D. D. (1995). Evaluating and optimizing autonomous text categorization systems.

12.

Miller, G. (1995). WordNet: A lexical database for English. Communications of the ACM, 38(11), 39-41.

13.

Mansuy, T. Evaluating WordNet features in Text Classification models.

14.

Rosso, P. (2004). Text categorization and information retrieval using WordNet senses (299-304). Proceedings of GWC2004.

15.

Scott, S. (1998). Text classifi- cation using WordNet Hypernyms (45-52). In the Proceedings of the Workshop on Usage of WordNet in Natural Language Processing Systems.

16.

Sebastiani, F. (2002). Hypertext categorization in Text Mining and Its Applications(109-129):WIT Press.

17.

Sebastiani, F. (2005). Text categorization in Text mining and its applications(109-129):WIT Press.

18.

van Rijsbergen, C. J. (1979). Information Retrieval:Butterworths.

19.

Verikas, A. (2002). Feature selection with neural networks. Pattern Recognition Letters, 23, 1323-1335.

20.

Witten, I. H. (2000). Data Mining: Practical Machine Learning Tools and Techniques with JAVA Implementations:Academic Press.

21.

Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1, 69-90.

Journal of the Korean Society for Information Management