정보관리학회지, 한국정보관리학회

1

목차 정보와 kNN 분류기를 이용한 사회과학 분야 도서 자동 분류에 관한 연구

이용구(계명대학교 문헌정보학과 부교수) 2020, Vol.37, No.1, pp.1-21 https://doi.org/10.3743/KOSIM.2020.37.1.001

초록보기

초록

이 연구에서는 한 대학도서관의 신착 도서 리스트 중 사회 과학 분야 6,253권에 대해 목차 정보를 이용하여 자동 분류를 적용하였다. 분류기는 kNN 알고리즘을 사용하였으며 자동 분류의 범주로 도서관에서 도서에 부여한 DDC 300대 강목을 사용하였다. 분류 자질은 도서의 서명과 목차를 사용하였으며, 목차는 인터넷 서점으로부터 Open API를 통해 획득하였다. 자동 분류 실험 결과, 목차 자질은 분류 재현율과 분류 정확률 모두를 향상시키는 좋은 자질임을 알 수 있었다. 또한 목차는 풍부한 자질로 불균형인 데이터의 과적합 문제를 완화시키는 것으로 나타났다. 법학과 교육학은 사회 과학 분야에서 특정성이 높아 서명 자질만으로도 좋은 분류 성능을 가져오는 점도 파악할 수 있었다.

Abstract

This study applied automatic classification using table of contents (TOC) text for 6,253 social science books from a newly arrived list collected by a university library. The k-nearest neighbors (kNN) algorithm was used as a classifier, and the ten divisions on the second level of the DDC’s main class 300 given to books by the library were used as classes (labels). The features used in this study were keywords extracted from titles and TOCs of the books. The TOCs were obtained through the OpenAPI from an Internet bookstore. As a result, it was found that the TOC features were good for improving both classification recall and precision. The TOC was shown to reduce the overfitting problem of imbalanced data with its rich features. Law and education have high topic specificity in the field of social sciences, so the only title features can bring good classification performance in these fields.

2

KDC 제6판 건축학 분야의 분류체계 개선방안

김송이(이화여자대학교) ; 정연경(이화여자대학교) 2014, Vol.31, No.3, pp.7-27 https://doi.org/10.3743/KOSIM.2014.31.3.007

초록보기

초록

한국십진분류법(Korea Decimal Classification) 5판에서는 건축학 분야가 건축공학과 건축술이라는 두 항목으로 나뉘어 분류되었으나 2013년 6판에서는 ‘건축, 건축학’으로 통합되었다. 본 연구는 KDC 5판과 KDC 6판의 비교 분석과 DDC, NDC, UDC의 비교 분석을 통하여 개정된 KDC 6판의 건축학 분야를 살펴보고 개선방안을 제안하였다. 주요 십진분류법과의 비교 분석결과 건축학은 항목 통합으로 인한 재분류의 필요성, 이전보다 길어진 건축사 분류번호 문제가 발생하였으며, 한국 전통 건축에 대한 분류 전개 개선이 필요한 것으로 나타나 이에 대한 개선방안을 제안하였다.

Abstract

Constructions and Architecture fields were divided into Architecture engineering and Architecture in the 5th edition of Korean Decimal Classification (KDC), but those were combined in the 6th edition of KDC published in 2013. The purposes of this study are to find problems and to suggest modifications through comparing and analyzing the 5th and the 6th editions of KDC, Dewey Decimal Classification, Nippon Decimal Classification and Universal Decimal Classification. The necessity of reclassification, a long classification number for History of Architecture and addition of categories of traditional building and architectural engineering are required to improve the 6th edition of KDC and the improvements and modifications of those problems are suggested.

3

사회과학 분야 도서의 목차 텍스트에 대한 통계적 특성에 관한 연구

이용구(계명대학교) 2019, Vol.36, No.2, pp.255-273 https://doi.org/10.3743/KOSIM.2019.36.2.255

초록보기

초록

이 연구는 최근 접근 및 활용이 높아지고 있는 목차에 대해 품사 측면과 주제 측면에서 가지는 기술 통계와 비교 분석을 수행하였다. 이를 위해 대학 도서관의 수서 목록에서 사회과학분야 도서를 추출하고 해당하는 도서에 대해 종합목록으로부터 DDC 분류기호를, 인터넷 서점으로부터 목차 정보를 추출하였다. 서명과 목차를 대상으로 형태소 분석하여 명사 중심의 어휘에 대해 기술통계와 빈도 분석을 실시하였다. 그 결과 형태소 측면에서 서명과 목차는 명사가 대략 절반가량 차지하며, 서명과 비교하여 목차는 50배 정도 더 많은 명사를 가지며, 목차에 출현한 명사 중에 목차만이 고유하게 가지는 비율이 95.2%에 달하는 것으로 파악되었다. 또한 목차는 사회과학 학문분야에 따라 길이가 차이가 나는 것으로 나타났다.

Abstract

Recently, the table of contents (TOC) has been becoming increasingly accessible and utilized. The study conducted descriptive statistics and comparative analysis of the table of contents in terms of parts of speech and subject in text. For this purpose, this study chose the books of the social sciences field from acquisition lists of an academic library, obtained Dewey class numbers of target books from KERIS union catalog, and extracted TOC data from online bookstore. Morphological analysis was performed on each book titles and TOCs, and descriptive statistics and frequency analysis were carried out. As a result, nouns made up roughly half of the morphemes of titles or the TOCs. TOCs had about 50 times more nouns than titles. The percentage of unique nouns that appeared only in the table of contents is estimated to be 95.2% of the TOC’s total nouns. The table of contents also showed a differences in its lengths depending on the field of social science.

4

이민정책 분야의 DDC 수정 전개 방안에 관한 연구

정연경(이화여자대학교) 2011, Vol.28, No.4, pp.33-48 https://doi.org/10.3743/KOSIM.2011.28.4.033

초록보기

초록

본 연구는 복합적인 주제 영역인 이민정책에 관한 다양한 정보를 효과적으로 조직하고 최적의 정보 서비스를 제공할 수 있는 분류표를 개발하기 위해 관련 문헌 연구와 다양한 문헌분류표를 조사하였다. 먼저, 문헌 연구를 통해 이민정책의 학문적 개념과 범주를 정의하고, 이를 바탕으로 핵심 주제 영역을 선정하였다. 다음으로, 듀이십진분류표, 미의회도서관분류표, 한국십진분류표, 국제십진분류표에서 이민정책 분야의 구조와 전개 항목, 특성을 비교 분석하였다. 그리고 이들 분류표 중 전 세계적으로 가장 많이 사용하고 있으며, 정기적으로 개정되고 있는 DDC 23판을 바탕으로 이민정책 분야를 수정 전개하기 위한 설계 원칙과 본표, 보조표를 제안하였다. 수정 전개된 듀이십진분류표는 이민정책을 다루는 주요 분야에 적용될 수 있으며, 이민 정책 관련 전문 연구기관이나 도서관에서 소장 자료를 효과적으로 분류하고 조직하며 이민정책 전문 정보를 통합 관리하기 위한 기초 자료로 활용될 것이다.

Abstract

This study investigated and analyzed various library classification systems and related literature in order to suggest some modifications and expansion of the Dewey Decimal Classification, the 23rd edition (DDC 23) in the area of immigration policy - an interdis- ciplinary subject - for the best information organization and services. First of all, definitions and scopes of the immigration policy were dealt with and then primary subject areas of it were selected. And then, DDC, Library of Congress Classification, Korean Decimal Classification, and Universal Decimal Classification were compared and analyzed according to the structures, headings and characteristics. Finally, modified classification schedules in immigration policy of the DDC 23 - the most frequently used one with an regular revision was proposed with their principles and main schedules with an auxiliary table. It can be used for an effective information organization in immigration policy area and it will be useful for many libraries and research institutes on immigration policy.

5

한국십진분류법 웹 버전 개발을 위한 기능요건 연구

양정윤(부산대학교 대학원 문헌정보학과 석사졸업, 진주교육대학교 도서관 사서) 2023, Vol.40, No.4, pp.147-165 https://doi.org/10.3743/KOSIM.2023.40.4.147

초록보기

초록

4차 산업혁명을 대표하는 신기술들이 이미 도서관 서비스에 구현이 되고 있다. 그러나 전통적인 사서 업무이자 향후 지속해야 하는 ‘분류’ 업무에 새로운 기술을 도입하여 업무 효율을 증대하고자 하는 방안 연구는 활발하지 않다. 해외 웹 버전 분류법인 WebDewey, Classification Web, UDC Online은 2000년대 초반에 개발되어 현재는 인쇄본보다 웹 버전이 더 활발히 사용되고 있고, 2018년 이후 듀이십진분류법(DDC)은 더 이상 인쇄본을 발간하지 않고 있다. 본 연구는 WebDewey, Classification Web, UDC Online 사례를 분석하고, 한국십진분류법(KDC) 웹 버전 개발을 위해 필요한 기능을 도출하여, AHP 분석을 통해 KDC 웹 버전 개발에 타당한 최종적인 기능을 제안했다.

Abstract

New technologies representing the Fourth Industrial Revolution are already being realized in library services. There is not, however, active research on measures to increase work efficiency by introducing a new technology in the work of “classification” that is part of the traditional librarian jobs they should continue in the future. The Dewey Decimal Classification (DDC) has not issued a print version since 2018. This study analyzes cases of WebDewey, Classification Web, and UDC Online. The functions required for the development of the Korean Decimal Classification (KDC) web version were derived, and the final functions suitable for the development of the KDC web version were proposed through AHP analysis.

6

음식문화 분야 인터넷자원 분류체계 분석을 통한 한국십진분류법의 항목명 확장에 관한 연구

정연경(이화여자대학교) ; 이미화(한성대학교) 2010, Vol.27, No.4, pp.49-69 https://doi.org/10.3743/KOSIM.2010.27.4.049

초록보기

초록

듀이십진분류법(DDC)은 문헌분류체계로 도서관에서 뿐만 아니라 인터넷자원을 분류하는 기반으로 사용되고 있는데, 이는 DDC가 주기적이며 지속적인 용어 확장을 통해 최신성과 실용성을 유지하기 때문이다. 반면, 한국십진분류법(KDC)은 비정기적인 개정 주기로, 용어의 최신성과 실용성이 떨어진다. KDC가 도서관뿐만 아니라 인터넷자원 분류에도 활용 가능하기 위해서는 실용적인 분류 항목명이 반영되어야 한다. 본 연구에서는 인터넷 자원의 디렉토리 분류체계와 KDC에서 사용하고 있는 분류항목명을 비교 분석하고 KDC에 추가할만한 분류항목명을 확장 제안하였다. 네이버, 야후, 교보문고, 아마존의 디렉토리 분류체계에서 음식문화 분야의 용어를 분석하였으며, 다른 분류체계를 참조하여 KDC로의 적용 방안을 제안하였다. KDC에 추가적인 분류항목명이 필요한 분야는 식품위생, 음료기술, 식품공학, 식품과 음료, 식사 및 식탁차림, 주방, 식당 공간이었으며 부족한 항목명은 음식 관련 용어 및 한식 관련 요리명이 주를 이루었다. 본 연구를 통해 KDC의 부족한 항목명과 적용방안을 제시함으로써 KDC가 도서관과 인터넷자원 분류에 활용될 수 있는 기반을 마련하였다.

Abstract

Library classification system is based upon academic disciplines, However, it is difficult to classify for Internet resources due to its lack of up-to-datedness and practicality. Especially, headings of Korean Decimal Classification need to reflect practical aspects and it should be also developed for classification of web based resources. The purposes of this study are to analyze the structures of directory classifications in Internet resources and to suggest additional headings of KDC as a practical library classification as well as a classification system for internet resources. Directory classification systems of Naver, Yahoo, Kyobo Internet book store, Amazon were selected and their food and culture subjects were analyzed for this study. The headings of KDC were compared to them and new possible headings were suggested with reference of NDC and DDC in the area of food and culture. This study provided a way of developing KDC for a classification system for Internet resources as well as library materials.

7

중학생의 소설 접근성을 증진시키기 위한 소설 분야 분류 개선 방안에 관한 연구

조혜전(이화여자대학교) ; 정연경(이화여자대학교) 2018, Vol.35, No.1, pp.61-82 https://doi.org/10.3743/KOSIM.2018.35.1.061

초록보기

초록

소설은 학교도서관에서 학생들이 가장 많이 열람하고 대출하는 장서이다. KDC는 학생들이 원하는 다양한 소설을 찾는데 제한점을 가진다. 이에 본 연구는 도서관과 서점, 출판사 등에서 사용하고 있는 소설 분류의 다양한 사례와 중학생의 소설 이용 행태를 설문 조사하여 이용자 요구에 맞게 소설 분류 개선안을 제안하였다. KDC 기호에 더하여 소설의 장르별 색띠를 부착하여 이용자들이 손쉽게 원하는 소설을 찾을 수 있도록 하였으며 추가적인 사항은 중학생들의 소설 접근성과 발견성을 향상시키고 향후 도서관이나 서점, 출판사에서 사용하는 소설 분야 세분에 대한 참고자료로 활용될 수 있을 것이다.

Abstract

Fiction is a collection that most students read and borrow in school libraries. KDC has several limitations when students look for fiction books they need. In line with this, we surveyed various cases of fiction classifications used in libraries, bookstores, and publishers and use behaviors of fiction of middle school students. Based upon the result of the surveys, we proposed a better way of classifying fiction books according to user needs. In addition to the KDC number, color bands were attached according to genres so that users could easily find the desired books. These suggestions and other information will enhance the accessibility and discoverability to fiction books for middle school students and may be used as reference materials for fiction classification in libraries, bookstores, and publishers in the future.

8

단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교

이용구(경북대학교 문헌정보학과) 2023, Vol.40, No.4, pp.307-327 https://doi.org/10.3743/KOSIM.2023.40.4.307

초록보기

초록

이 연구는 짧은 텍스트인 서명에 단어 임베딩이 미치는 영향을 분석하기 위해 Word2vec, GloVe, fastText 모형을 이용하여 단행본 서명을 임베딩 벡터로 생성하고, 이를 분류자질로 활용하여 자동분류에 적용하였다. 분류기는 k-최근접 이웃(kNN) 알고리즘을 사용하였고 자동분류의 범주는 도서관에서 도서에 부여한 DDC 300대 강목을 기준으로 하였다. 서명에 대한 단어 임베딩을 적용한 자동분류 실험 결과, Word2vec와 fastText의 Skip-gram 모형이 TF-IDF 자질보다 kNN 분류기의 자동분류 성능에서 더 우수한 결과를 보였다. 세 모형의 다양한 하이퍼파라미터 최적화 실험에서는 fastText의 Skip-gram 모형이 전반적으로 우수한 성능을 나타냈다. 특히, 이 모형의 하이퍼파라미터로는 계층적 소프트맥스와 더 큰 임베딩 차원을 사용할수록 성능이 향상되었다. 성능 측면에서 fastText는 n-gram 방식을 사용하여 하부문자열 또는 하위단어에 대한 임베딩을 생성할 수 있어 재현율을 높이는 것으로 나타났다. 반면에 Word2vec의 Skip-gram 모형은 주로 낮은 차원(크기 300)과 작은 네거티브 샘플링 크기(3이나 5)에서 우수한 성능을 보였다.

Abstract

To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).

9

KDC 제4판과 DDC 제21판의 특수사회학 관련 주제에 관한 비교분석

배영활(계명대학교) ; 오동근(계명대학교) 2002, Vol.19, No.4, pp.53-76 https://doi.org/10.3743/KOSIM.2002.19.4.053

초록보기

초록

이 연구는 특정 주제 분야를 사회학적 관련 이론 및 기법을 적용하여 고찰하고 있는 문헌의 분류를 효율적으로 수행하기 위해 국내에서 가장 많이 사용되고 있는 한국십진분류법(KDC)과 듀이십진분류법(DDC)의 특수사회학 분류항목들을 비교 분석하였다. 특히 특수사회학 관련 분류항목들을 종교, 예체능, 과학, 언어, 사회, 지역 등 6개 분야로 구분하여 분속한 결과 분류항목 설정이 상이함으로 인한 문제점과 분류항목이 설정되지 않음으로 인한 문제점이 나타났다. 이를 토대로 이 연구에서는 특정 주제의 이론 및 기법 적용에 일관성을 기하도록 하고 분류번호 부여를 위한 추가의 항목 전개 등 분류 실무자의 판단을 도와주는 한편 나아가 KDC의 제5판 개정에 일조하는데 그 의의를 두었다.

Abstract

This study compares and analyzes the classes in the major special areas in the sociology, called "branch sociology" included in the Korean Decimal Classification 4th edition and Dewey Decimal Classification 21st edition. Especially it analyzes the related classes of specified areas(branch sociology) of sociology including those of arts and sports, sciences, languages, society, region, etc. class by class. In this analysis two systems show many differences in the classes included and in the locations of same classes. This analysis can be useful for the future revision of KDC.

10

OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구

노정순(한남대학교) 2004, Vol.21, No.1, pp.93-117 https://doi.org/10.3743/KOSIM.2004.21.1.093

초록보기

초록

본 연구는 OPAC에서 계층 클러스터링을 응용하여 소장자료를 계층구조로 분류하여 열람하는데 사용될 수 있는 최적의 계층 클러스터링 모형을 찾기 위한 목적으로 수행되었다. 문헌정보학 분야 단행본과 학위논문으로 실험집단을 구축하여 다양한 색인기법(서명단어 자동색인과 통제어 통합색인)과 용어가중치 기법(절대빈도와 이진빈도), 유사도 계수(다이스, 자카드, 피어슨, 코싸인, 제곱 유클리드), 클러스터링 기법(집단간 평균연결, 집단내 평균연결, 완전연결)을 변수로 실험하였다. 연구결과 집단간 평균연결법과 제곱 유클리드 유사도를 제외하고 나머지 유사도 계수와 클러스터링 기법은 비교적 우수한 클러스터를 생성하였으나, 통제어 통합색인을 이진빈도로 가중치를 부여하여 완전연결법과 집단간 평균연결법으로 클러스터링 하였을 때 가장 좋은 클러스터가 생성되었다. 그러나 자카드 유사도 계수를 사용한 집단간 평균연결법이 십진구조와 더 유사하였다.

Abstract

This study is to develop a hiararchic clustering model for document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean), and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지