정보관리학회지, 한국정보관리학회

권한신청
P-ISSN1013-0799
E-ISSN2586-2073
KCI

검색어: 주제색인, 검색결과: 2

김나름(연세대학교) ; 김태수(연세대학교) 2006, Vol.23, No.4, pp.69-87 https://doi.org/10.3743/KOSIM.2006.23.4.069

초록보기

초록

소설을 비롯한 문학작품에 대한 접근은 기술요소 중심이었고, 주제접근 역시 작품 속에 등장하는 소재, 인물명, 지명 등 형식 요소에 국한되어 왔다. 이러한 관행은 소설 주제의 본질을 놓친 것이며 미학적 경험을 추구하는 이용자의 주제요구를 반영하지 못한다. 이 연구에서는 소설 주제접근체계의 확장을 위해 상징 및 모티프의 개념과 주제접근점으로서의 가능성을 검토하였다. 이와 함께 해당 용어사전을 정보원으로 활용하여 상징과 모티프 체계를 구성하고, 20세기 한국소설에 적용해 이용성과 한계점을 논하였다.

Abstract

The access to literary works, including fictions, has focused on descriptive elements, and the subject access has been confined to denotative elements such as the subject matter, name of character and geographical name, etc, which appear in the work. This practice will not lead to the essence of subject of fiction, and does not reflect the demand of users for the subject who pursue aesthetic experience. In this study, concepts of symbol and motif and their possibility to be used as subject access point are considered to enhance a subject access scheme. In addition, this study tries to build the scheme of symbol and motif by using the glossary as the source of information. The composed schemes are applied to 20th century Korean fictions and its usability and limits are discussed.

학습문헌집합에 기 부여된 범주의 정확성과 문헌 범주화 성능

심경(Systems R&D Center, Iris.Net) ; 정영미(연세대학교) 2006, Vol.23, No.2, pp.265-285 https://doi.org/10.3743/KOSIM.2006.23.2.265

초록보기

초록

문헌범주화에서는 학습문헌집합에 부여된 주제범주의 정확성이 일정 수준을 가진다고 가정한다. 그러나, 이는 실제 문헌집단에 대한 지식이 없이 이루어진 가정이다. 본 연구는 실제 문헌집단에서 기 부여된 주제범주의 정확성의 수준을 알아보고, 학습문헌집합에 기 부여된 주제범주의 정확도와 문헌범주화 성능과의 관계를 확인하려고 시도하였다. 특히, 학습문헌집합에 부여된 주제범주의 질을 수작업 재색인을 통하여 향상시킴으로써 어느 정도까지 범주화 성능을 향상시킬 수 있는가를 파악하고자 하였다. 이를 위하여 과학기술분야의 1,150 초록 레코드 1,150건을 전문가 집단을 활용하여 재색인한 후, 15개의 중복문헌을 제거하고 907개의 학습문헌집합과 227개의 실험문헌집합으로 나누었다. 이들을 초기문헌집단, Recat-1, Recat-2의 재 색인 이전과 이후 문헌집단의 범주화 성능을 kNN 분류기를 이용하여 비교하였다. 초기문헌집단의 범주부여 평균 정확성은 16%였으며, 이 문헌집단의 범주화 성능은 F1값으로 17%였다. 반면, 주제범주의 정확성을 향상시킨 Recat-1 집단은 F1값 61%로 초기문헌집단의 성능을 3.6배나 향상시켰다.

Abstract

In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in F1 value. On the other hand, the Recat-1 set scores F1 value of 61%, which is 3.6 times higher than that of the Initial set.

바로가기메뉴

초록

Abstract

초록

Abstract

정보관리학회지