바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2012, v.29 no.2, pp.225-246
https://doi.org/10.3743/KOSIM.2012.29.2.225


  • Downloaded
  • Viewed

Abstract

This study purported to investigate the possibility of automatic descriptor assignment using the reclassification of author keywords in domestic scholarly databases. In the first stage, we selected optimal classifiers and parameters for the reclassification by comparing the characteristics of machine learning classifiers. In the next stage, learning the author keywords that were assigned to the selected articles on readings, the author keywords were automatically added to another set of relevant articles. We examined whether the author keyword reclassifications had the effect of vocabulary control just as descriptors collocate the documents on the same topic. The results showed the author keyword reclassification had the capability of the automatic descriptor assignment.

keywords
자동분류, 텍스트 범주화, 재분류, 어휘통제, 디스크립터, 저자키워드, automatic classification, text categorization, reclassification, vocabulary control, descriptors, author keywords, automatic classification, text categorization, reclassification, vocabulary control, descriptors, author keywords

Reference

1.

김용환. (2012). 위키피디아를 이용한 분류자질 선정에 관한 연구. 정보관리학회지, 29(2), 155-171. http://dx.doi.org/10.3743/KOSIM.2012.29.2.155.

2.

김판준. (2006). 기계학습을 통한 디스크립터 자동부여에 관한 연구. 정보관리학회지, 23(1), 279-299.

3.

김판준. (2006). 로치오 알고리즘을 이용한 학술지 논문의 디스크립터 자동부여에 관한 연구. 정보관리학회지, 23(3), 69-90.

4.

김판준. (2008). 용어 가중치부여 기법을 이용한 로치오 분류기의 성능 향상에 관한 연구. 정보관리학회지, 25(1), 211-233.

5.

김판준. (2007). 문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구. 정보관리학회지, 24(1), 251-271.

6.

윤구호. (1999). 색인·초록:한국도서관협회.

7.

이재윤. (2005). 문헌간 유사도를 이용한 SVM 분류기의 문헌분류성능 향상에 관한 연구. 정보관리학회지, 22(3), 261-287.

8.

이재윤. (2005). 자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구. 한국문헌정보학회지, 39(2), 123-146.

9.

정영미. (2012). 정보검색연구(증보판):연세대학교 출판문화원.

10.

정은경. (2009). 문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구. 정보관리학회지, 26(3), 261-278.

11.

Chen, E.. (2011). Exploiting probabilistic topic models to improve text categorization under class imbalance. Information Processing and Management, 47(2), 202-214.

12.

Chen, Yao-Tsung. (2011). Using chi-square statistics to measure similarities for text categorization. Expert Systems with Application, 38(4), 3085-3090.

13.

Chung, Y.. (1998). Automatic subject indexing using an associative neural network (59-68). Proceedings of the 3rd ACM International Conference on Digital Libraries (DL '98). ACM Press.

14.

Gil-Leiva, I.. (2007). Keywords given by authors of scientific articles in database descriptors. Journal of the American Society for Information Science and Technology, 58(8), 1175-1187.

15.

Harish, B. S.. (2010). Representation and classification of text documents : A brief review (110-119). IJCA Special Issue on"Recent Trends in Image Processing and Pattern Recognition"RTIPPR.

16.

Hurt, C. D.. (2010). Automatically generated keywords: A comparison to author-generated keywords in the sciences. Journal of Information and Organizational Sciences, 34(1), 81-88.

17.

Jiang, S.. (2012). An improved k-nearest-neighbor algorithm for text categorization. Expert Systems with Applications, 39(1), 1503-1509.

18.

Joachims, T.. (1998). Text categorization with support vector machines : Learning with many relevant features (137-142). Proceedings of the 10th European Conference on Machine Learning.

19.

Khan, A.. (2010). A review of machine learning algorithms for text-documents classification. Journal of Advances in Information Technology, 1(1), 4-20.

20.

Kumar, M. Arun. (2010). A comparison study on multiple binary-class SVM methods for unilabel text categorization. Pattern Recognition Letters, 31(11), 1437-1444.

21.

Lauser, B.. (2003). Automatic multi-label subject indexing in a multilingual environment (140-151). Proceedings of the 7th European Conference in Research and Adavanced Technology for Digital Libraries(ECDL '03).

22.

Lewis, D. D.. (1996). Training algorithms for linear text classfiers (298-306). Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '96).

23.

Li, Cheng Hua. (2009). An efficient document classification model using an improved back propagation neural network and singular value decomposition. Expert Systems with Applications, 36(2), 3208-3215.

24.

Li, Xiangdong. (2011). The review of text categorization research over Chinese Library Classification. American Journal of Engineering and Technology Research, 11(9), 2729-2734.

25.

Miao, Yun-Qian. (2011). Pairwise optimized Rocchio algorithm for text categorization. Pattern Recognition, 32(2), 375-382.

26.

Mitchell, T. M.. (1997). Machine learning:McGraw-Hill.

27.

Moens, Marie-Francine. (2000). Automatic indexing and abstracting of document texts:Kluwer Academic Publishers.

28.

Nidhi. (2011). Recent trends in text classification techniques. International Journal of Computer Applications, 35(6), 45-51.

29.

Ruiz, M. E.. (2002). Hierarchical text categorization using neural networks. Information Retrieval, 5(1), 87-118.

30.

Sebastiani, F.. (2002). Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 1-47.

31.

Torii, M.. (2011). An exploratory study of a text classification framework for Internet-based surveillance of emerging epidemics. International Journal of Medical Informatics, 80(1), 56-66.

32.

Uĝuz, H.. (2011). A two-stage feature selection methods for text categorization by using information gain, principal component analysis and genetic algorithm. Knowledge-Based Systems, 24(7), 1024-1032.

33.

Vasuki, V.. (2010). Reflective random indexing for semi-automatic indexing of the biomedical literature. Journal of Biomedical Informatics, 43(5), 694-700.

34.

Villena-Román, J.. (2011). Hybrid approach combining machine learning and a rule-based expert system for text categorization (323-328). Proceedings of the Twenty-Fourth International Florida Artificial Intelligence Research Society Conference.

35.

Voorhees, E. M.. (2005). TREC : Experiment and evaluation in information retrieval:MIT Press.

36.

Wang, Tai-Yue. (2007). Fuzzy support vector machine for multi-class text categorization. Information Processing and Management, 43(4), 914-929.

37.

Wu, Chih-Hung. (2009). Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Systems with Applications, 36(1), 4321-4330.

38.

Yang, Y.. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1), 69-90.

39.

Yang, Y.. (1997). A comparative study on feature selection in text categorization (412-420). Proceedings of the 14th International Conference on Machine Learning(ICML '97).

40.

Yang, Y.. (1999). A re-examination for text categorization methods (42-49). Proceedings of the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval('SIGIR 99).

41.

Yu, Bo. (2008). Latent semantic analysis for text categorization using neural network. Knowledge-Based Systems, 21(8), 900-904.

42.

Zhang, J.. (2003). Robustness of regularized linear classification methods in text categorization (190-197). Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR '03).

43.

Zhang, Y.. (2011). Multilingual sentence categorization and novelty mining. Information Processing and Management, 47(5), 667-675.

Journal of the Korean Society for Information Management