정보관리학회지, 한국정보관리학회

11

단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로

최가람(경기대학교) ; 최성필(경기대학교) 2018, Vol.35, No.1, pp.231-250 https://doi.org/10.3743/KOSIM.2018.35.1.231

초록보기

초록

본 논문에서는 온라인 뉴스 기사에서 자동으로 추출된 키워드 집합을 활용하여 특정 시점에서의 세부 주제별 토픽을 추출하고 정형화하는 새로운 방법론을 제시한다. 이를 위해서, 우선 다량의 텍스트 집합에 존재하는 개별 단어들의 중요도를 측정할 수 있는 복수의 통계적 가중치 모델들에 대한 비교 실험을 통해 TF-IDF 모델을 선정하였고 이를 활용하여 주요 키워드 집합을 추출하였다. 또한 추출된 키워드들 간의 의미적 연관성을 효과적으로 계산하기 위해서 별도로 수집된 약 1,000,000건 규모의 뉴스 기사를 활용하여 단어 임베딩 벡터 집합을 구성하였다. 추출된 개별 키워드들은 임베딩 벡터 형태로 수치화되고 K-평균 알고리즘을 통해 클러스터링 된다. 최종적으로 도출된 각각의 키워드 군집에 대한 정성적인 심층 분석 결과, 대부분의 군집들이 레이블을 쉽게 부여할 수 있을 정도로 충분한 의미적 집중성을 가진 토픽들로 평가되었다.

Abstract

In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

12

기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구

김선우(경기대학교 문헌정보학과) ; 고건우(경기대학교 문헌정보학과) ; 최원준(한국과학기술정보연구원 콘텐츠 큐레이션센터) ; 정희석(한국과학기술정보연구원 콘텐츠 큐레이션센터) ; 윤화묵(한국과학기술정보연구원 콘텐츠큐레이션센터) ; 최성필(경기대학교) 2018, Vol.35, No.4, pp.141-164 https://doi.org/10.3743/KOSIM.2018.35.4.141

초록보기

초록

최근 학술문헌의 양이 급증하고, 융복합적인 연구가 활발히 이뤄지면서 연구자들은 선행 연구에 대한 동향 분석에 어려움을 겪고 있다. 이를 해결하기 위해 우선적으로 학술논문 단위의 분류 정보가 필요하지만 국내에는 이러한 정보가 제공되는 학술 데이터베이스가 존재하지 않는다. 이에 본 연구에서는 국내 학술문헌에 대해 다중 분류가 가능한 자동 분류 시스템을 제안한다. 먼저 한국어로 기술된 기술과학 분야의 학술문헌을 수집하고 K-Means 클러스터링 기법을 활용하여 DDC 600번 대의 중분류에 맞게 매핑하여 다중 분류가 가능한 학습집합을 구축하였다. 학습집합 구축 결과, 메타데이터가 존재하지 않는 값을 제외한 총 63,915건의 한국어 기술과학 분야의 자동 분류 학습집합이 구축되었다. 이를 활용하여 심층학습 기반의 학술문헌 자동 분류 엔진을 구현하고 학습하였다. 객관적인 검증을 위해 수작업 구축한 실험집합을 통한 실험 결과, 다중 분류에 대해 78.32%의 정확도와 72.45%의 F1 성능을 얻었다.

Abstract

Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

13

공공도서관 서비스 품질 평가를 통한 특화서비스에 대한 이용자 인식 연구

정대근(전남대학교 문헌정보학과) ; 노영희(건국대학교) 2018, Vol.35, No.4, pp.51-75 https://doi.org/10.3743/KOSIM.2018.35.4.051

초록보기

초록

본 연구는 서비스 품질 평가 도구인 LibQUAL+를 활용하여 특화서비스를 제공여부에 따른 도서관 서비스 품질의 차이를 통해 공공도서관 특화서비스에 대한 이용자의 인식을 확인하고자 하였다. 분석결과 서비스 수준에 대한 평가 결과 특화서비스를 제공하는 도서관과 그렇지 않는 도서관 차이에 최소수준 및 인식수준에 있어 차이가 나타났으며, 기대수준은 차이가 없는 것으로 나타났다. 서비스 정도는 특화서비스를 제공하는 도서관 그렇지 않는 도서관 보다 적정성 갭 및 우위성 갭에서 대부분 높은 이용자 인식을 나타내고 있어, 공공도서관을 이용하는 이용자들은 특화서비스를 제공하는 도서관이 그렇지 않는 도서관 보다 더 좋은 서비스를 제공하고 있다고 인식하는 것으로 나타났다.

Abstract

This study was intended to confirm the perception of the users for the public libraries specialized service through the difference in library service quality according to the availability of specialized services by utilizing ‘LibQUAL+’ that is a service quality assessment tool. As a results, there is difference in the minimum and perceived levels between libraries that provide assessment result specialized services for service levels and those that did not. And there was no difference in the expected level. The degree of service showed generally higher user perception in the Adequacy Gap and Superiority Gap than libraries that do not offer specialized services. Therefore, the users of public libraries recognize that libraries that offer specialized services are better served than libraries that do not.

14

공공도서관 어린이 독서프로그램의 성과 측정을 위한 프레임워크 개발에 관한 연구

박성재(한성대학교) ; 한상우(광주대학교) 2018, Vol.35, No.3, pp.311-325 https://doi.org/10.3743/KOSIM.2018.35.3.311

초록보기

초록

본 연구는 공공도서관에서 어린이를 대상으로 진행하는 독서프로그램의 성과를 측정하기 위한 프레임워크 개발을 목적으로 한다. 프레임워크 개발을 위한 이론적 토대로 성과 평가에 기반 한 로직모델을 적용하였다. 로직모델의 요소로 제안된 6개 요소 중에서 가정과 외부적 요인을 제외한 투입, 활동, 산출, 성과 요인을 중심으로 프로그램 평가 프레임워크를 개발하였다. 연구결과로, 서울 시내 한 공공도서관에서 연구기간 동안 진행된 4개의 프로그램에 대한 평가 프레임워크와 성과측정을 위한 지표를 제안하였다. 프로그램별로 다양한 성과지표의 개발이 가능하지만 본 연구에서는 도서관 데이터를 기반으로 측정 가능한 지표를 중심으로 제안하였다. 본 연구 결과가 사례 연구로 진행되었지만 대상 프로그램이 공공도서관에서 일반적으로 진행하는 프로그램이라는 점에서 타 도서관의 어린이 대상 프로그램의 평가 프레임워크로 활용될 수 있을 것으로 기대된다.

Abstract

The purpose of this study is to develop frameworks for evaluating reading programs for children provided by a public library. Logic Model based on outcome evaluations was applied for the framework development. While the logic model is generally composed of six factors, the frameworks developed in this study has four factors including input, activity, output, and outcome. Additionally, this study suggests outcome indicators which were driven from library data. Even though the evaluation frameworks were developed from specific programs operated by a public library, those might be able to be used to evaluate other libraries’ programs for children since the target programs are commonly provided by public libraries.

15

중학생의 소설 접근성을 증진시키기 위한 소설 분야 분류 개선 방안에 관한 연구

조혜전(이화여자대학교) ; 정연경(이화여자대학교) 2018, Vol.35, No.1, pp.61-82 https://doi.org/10.3743/KOSIM.2018.35.1.061

초록보기

초록

소설은 학교도서관에서 학생들이 가장 많이 열람하고 대출하는 장서이다. KDC는 학생들이 원하는 다양한 소설을 찾는데 제한점을 가진다. 이에 본 연구는 도서관과 서점, 출판사 등에서 사용하고 있는 소설 분류의 다양한 사례와 중학생의 소설 이용 행태를 설문 조사하여 이용자 요구에 맞게 소설 분류 개선안을 제안하였다. KDC 기호에 더하여 소설의 장르별 색띠를 부착하여 이용자들이 손쉽게 원하는 소설을 찾을 수 있도록 하였으며 추가적인 사항은 중학생들의 소설 접근성과 발견성을 향상시키고 향후 도서관이나 서점, 출판사에서 사용하는 소설 분야 세분에 대한 참고자료로 활용될 수 있을 것이다.

Abstract

Fiction is a collection that most students read and borrow in school libraries. KDC has several limitations when students look for fiction books they need. In line with this, we surveyed various cases of fiction classifications used in libraries, bookstores, and publishers and use behaviors of fiction of middle school students. Based upon the result of the surveys, we proposed a better way of classifying fiction books according to user needs. In addition to the KDC number, color bands were attached according to genres so that users could easily find the desired books. These suggestions and other information will enhance the accessibility and discoverability to fiction books for middle school students and may be used as reference materials for fiction classification in libraries, bookstores, and publishers in the future.

16

빅데이터 연구 논문의 주제 분야 연관관계 분석: 동시 인용 관계를 적용하여

곽철완(강남대학교) 2018, Vol.35, No.1, pp.13-32 https://doi.org/10.3743/KOSIM.2018.35.1.013

초록보기

초록

본 연구의 목적은 빅데이터 연구 논문의 주제 분야 간의 연관관계를 분석하는데 있다. 동시 인용 관계를 적용하여 분석 대상의 주제 분야를 추출하였으며, R 프로그램의 Apriori 알고리즘을 이용하여 연관관계의 규칙을 분석하고, arulesViz 패키지를 사용하여 시각화하였다. 연구 결과 22개 주제 분야가 추출되었는데, 이들 주제 분야는 3가지 군집으로 구분되었다. 주제 분야의 연관관계 유형을 분석한 결과, 연관관계의 복잡성에 따라 ‘전문형’, ‘일반형’, ‘확대형’으로 구분되었다. 전문형에는 문헌정보학, 신문방송학 등이 포함되었고, 일반형에는 정치외교학, 무역학, 관광학 등이 포함되었고, 확대형에는 기타인문학, 사회과학일반, 관광학일반 등이 포함되었다. 이 연관관계는 빅데이터 연구자가 한 주제 분야를 인용할 때 관계가 있는 다른 주제 분야를 인용하는 경향을 보여주는 것으로, 도서관에서 학술정보서비스를 위해 연관관계를 활용한 서비스를 고려해야 할 필요가 있다.

Abstract

The purpose of this study is to analyze the association among the subject areas of big data research papers. The subject group of the units of analysis was extracted by applying co-citation networks, and the rules of association were analyzed using Apriori algorithm of R program, and visualized using the arulesViz package of R program. As a result of the study, 22 subject areas were extracted and these subjects were divided into three clusters. As a result of analyzing the association type of the subject, it was classified into ‘professional type’, ‘general type’, ‘expanded type’ depending on the complexity of association. The professional type included library and information science and journalism. The general type included politics & diplomacy, trade, and tourism. The expanded types included other humanities, general social sciences, and general tourism. This association networks show a tendency to cite other subject areas that are relevant when citing a subject field, and the library should consider services that use the association for academic information services.

17

연구데이터 관리서비스의 구현 시 고려사항에 관한 연구

김성훈(성균관대학교 문헌정보학과) ; 오삼균(성균관대학교 문헌정보학과) 2018, Vol.35, No.2, pp.141-165 https://doi.org/10.3743/KOSIM.2018.35.2.141

초록보기

초록

본 연구의 목적은 연구데이터 관리서비스 구현 시 성공적인 서비스를 위한 고려사항을 도출하는 것이다. 이를 위해 선행연구를 활용하여 연구데이터 관리서비스의 영역을 파악하였고, 미국, 독일, 호주에서 연구데이터 관리서비스를 시행중인 대학도서관 6곳과 1개의 기관에서 담당자 8명을 대상으로 연구데이터 서비스에 관한 질문의 답변을 이메일을 통해 수집하였다. 또 해외서비스를 대상으로 수집한 고려사항이 국내에 적용가능한지 국내 연구데이터 관리서비스 전문가와 검토하였다. 연구데이터 서비스 영역은 총 9개의 카테고리로 구분하여 분석하였는데, 연구서비스와 연구데이터 관리서비스 연계, 국가/대학/기관 차원의 협약, 메타데이터 입력주체 및 필수 요소, 직원의 전문화 방안, 이용자 요구분석을 통한 주요서비스 영역 선정, 연구데이터와 연구결과물의 효과적인 연결방안, 이용자와 유관기관과 긴밀한 공조 등의 연구데이터 관리서비스 구축 시 고려사항을 도출할 수 있었다.

Abstract

The purpose of this study is to determine crucial factors of consideration in ensuring the successful implementation of research data management services. The study begins by extracting a range of service areas from their equivalent in existing research on data management services. It then collects relevant information via e-mail survey from eight individuals respectively overseeing research data management services at six university libraries and one institution located throughout the United States, Germany, and Australia. Having originated in overseas cases, the resulting factors of consideration were reviewed by domestic experts in research data management services. The finalized areas of research data management services consist of nine categories. The crucial factors of consideration in RDM services are connection between research services and research data management services; national/university-level/institutional agreements; metadata entry personnel and required elements; strategies for the provision of specialized staff; major service area selection through user demand analysis; effective linkage between research data and research results; and close cooperation with users and related organizations.

18

종합목록DB를 이용한 국내 대학도서관 서양서 소장 실태 분석

이지원(대구가톨릭대학교) ; 이재윤(명지대학교) 2018, Vol.35, No.1, pp.205-229 https://doi.org/10.3743/KOSIM.2018.35.1.205

초록보기

초록

본 연구는 국내 대학도서관 서양서 장서 개발의 변화를 살펴보기 위해 2003년과 2013년에 출판된 서양서 소장 실태를 KERIS 종합목록을 통해 분석하였다. 이를 위해 새로운 장서 지표로 소장 h-지수, 장서 고유성 지수, 그리고 공통장서 확보율을 제안하고 기본 지표인 종수 및 책수, 그리고 종당 책수와 함께 활용하였다. 분석 결과 2003년에 비해서 2013년에 출판된 서양서의 전체 소장 종수는 16.1% 감소하고 소장 책수는 42.2% 감소하여 소장 책수가 더 크게 감소하였다. 여러 도서관이 공통적으로 소장하는 공통 장서, 또는 기본 장서의 규모를 나타내는 공통장서 확보율은 줄어들었고, 장서고유성은 증가하였다. DDC 주류 중에서는 컴퓨터 관련 도서가 급감한 0XX(총류) 분야의 감소율이 가장 컸다. 도서관별 장서량 측면에서는 2003년에 비해서 2013년 출판도서의 경우에 상위 도서관이 더욱 과점하는 빈익빈 부익부 현상이 심화되었다.

Abstract

This study analyzed Korean university libraries’ holdings of Western language books published in 2003 and 2013 using the KERIS union catalog with a view to investigating the changes in collection development of Western language books in the libraries. To do that, new collection indexes - holding h-index, CUI (Collection Uniqueness Index), and CCHR (Common Collection Holding Ratio) - were suggested, and they were used with basic indexes such as the number of titles, the number of books, and the number of books per title. The analysis reveals that compared to those published in 2003, the number of titles was decreased by 16.1% with those published in 2013, and the number of books dropped more sharply, by 42.2%. Also, in 2013, CCHR was decreased while CUI was increased. In terms of subject, among DDC main classes, 0XX (Generalities) showed the greatest decrease rate in both the number of titles and books because of the radical reduction of computer-related books. In terms of each library’s holdings, the number of Western language books held by top libraries has been increased with those published in 2013.

19

토픽 모델링 기반 내용 분석을 통한 학제 간 융합기술 도출 방법

정도헌(덕성여자대학교) ; 주황수(덕성여자대학교) 2018, Vol.35, No.3, pp.77-100 https://doi.org/10.3743/KOSIM.2018.35.3.077

초록보기

초록

본 연구는 텍스트 마이닝 기법을 활용하여 대량의 데이터로부터 학제 간 융합 기술을 발굴하는 일련의 과정을 제시하는 것을 목표로 한다. 바이오공학 기술(BT) 분야와 정보통신 기술(ICT) 분야 간의 융합 연구를 위해 (1) BT 분야의 기술용어 목록을 작성하여 대량의 학술논문 메타데이터를 수집한 후 (2) 패스파인더 네트워크 척도 알고리즘을 이용해 유망 기술의 지식 구조를 생성하고 (3) 토픽 모델링 기법을 사용하여 BT분야 중심의 내용 분석을 수행하였다. 다음 단계인 BT-ICT 융합 기술 아이템 도출을 위해, (4) BT-ICT 관련 정보를 얻기 위해 BT 기술용어 목록을 상위 개념으로 확장한 후 (5) OpenAPI 서비스를 이용하여 두 분야가 관련된 학술 정보의 메타데이터를 자동 수집하여 (6) BT-ICT 토픽 모델의 내용 분석을 실시하였다. 연구를 통해 첫째, 융합 기술의 발굴을 위해서는 기술 용어 목록의 작성이 중요한 지식 베이스가 된다는 점과 둘째, 대량의 수집 문헌을 분석하기 위해서는 데이터의 차원을 줄여 분석을 용이하게 해주는 텍스트 마이닝 기법이 필요하다는 점을 확인하였다. 본 연구에서 제안한 데이터 처리 및 분석 과정이 학제 간 융합 연구의 가능성이 있는 기술 요소들을 발굴하는 데 효과적이었음을 확인할 수 있었다.

Abstract

The objectives of this study is to present a discovering process of interdisciplinary convergence technology using text mining of big data. For the convergence research of biotechnology(BT) and information communications technology (ICT), the following processes were performed. (1) Collecting sufficient meta data of research articles based on BT terminology list. (2) Generating intellectual structure of emerging technologies by using a Pathfinder network scaling algorithm. (3) Analyzing contents with topic modeling. Next three steps were also used to derive items of BT-ICT convergence technology. (4) Expanding BT terminology list into superior concepts of technology to obtain ICT-related information from BT. (5) Automatically collecting meta data of research articles of two fields by using OpenAPI service. (6) Analyzing contents of BT-ICT topic models. Our study proclaims the following findings. Firstly, terminology list can be an important knowledge base for discovering convergence technologies. Secondly, the analysis of a large quantity of literature requires text mining that facilitates the analysis by reducing the dimension of the data. The methodology we suggest here to process and analyze data is efficient to discover technologies with high possibility of interdisciplinary convergence.

20

도서관의 오픈 데이터 품질측정모델 개발

박진호((주) 리스트) 2018, Vol.35, No.1, pp.33-59 https://doi.org/10.3743/KOSIM.2018.35.1.033

초록보기

초록

본 연구는 최근 열린 정부 데이터에 대한 다차원 척도, 모델 개발 연구가 시작되고 있으나, 도서관에서는 관련 연구가 부족하다는 점을 고려하여 도서관에 적용할 수 있는 오픈 데이터 품질측정 모델개발을 목적으로 하였다. 본 연구는 모델개발과 모델평가 두 단계로 수행하였다. 모델개발은 델파이 기법을 적용하였으며, 모델평가는 도서관 오픈 데이터 이용자를 대상으로 설문조사를 실시하여 모델의 타당도와 신뢰도를 측정하였다. 모델개발은 델파이 기법을 적용하여 총 4차례 수행하여 3개 차원, 18개 요인, 133개 측정요소로 구성된 모델을 도출하였다. 모델평가는 델파이 기법으로 완성한 모델을 도서관 오픈 데이터 이용자인 국내․외 사서, 개발자, 오픈 데이터 활동가를 대상으로 적합성 설문조사를 실시하여 모델의 타당도와 신뢰도를 검증하였다. 그 결과 당초 18개 요인, 133개 측정요소는 15개 요인, 54개 측정요소가 타당성을 확보한 것으로 나타났다. 신뢰도는 차원별, 측정요인별로 모두 기준치인 0.6 이상의 결과를 보여주고 있어 높은 신뢰도를 확보한 것으로 나타났다. 모델평가를 통한 이용자 타당도, 신뢰도 분석으로 전문가가 구성한 평가모델은 현장에서 즉시 활용될 수 있을 정도로 정제되었다.

Abstract

This study draws on the current momentum to diversify open government data research through multidimensional scaling and model development. It formulates a quality assessment model applicable to library open data, taking into consideration the paucity of such research in the field. The model was developed using the Delphi method and verified for validity and reliability on the basis of a survey administered to library open data users. The results of the fourth round exhibited an average of 4.00 for all measured elements and a minimum validity of .75, rendering the model appropriate for use in quality assessments of library open data. The convergence and stability results provided by the expert panel fell below .50, confirming that there was no need to conduct further surveys in order to establish the validity of the Delphi method. The model's reliability likewise garnered results of .60 and above in all three dimensions. This Model completed with the input of the Delphi panel was put through a verification process in which library open data users such as domestic and international librarians, developers, and open data activists reviewed the model for validity and reliability. The model scored low on validity on account of its failure to load all measure factors and elements pertaining to the three dimensions. Reliability results, on the other hand, were at 0.6 and above for all dimensions and measured elements.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지