정보관리학회지, 한국정보관리학회

11

도서관 공공데이터의 품질에 관한 연구: 도서관 정보나루의 도서 상세 조회 API를 중심으로

양수완(중앙대학교 문헌정보학과 박사과정 수료) 2020, Vol.37, No.4, pp.181-206 https://doi.org/10.3743/KOSIM.2020.37.4.181

초록보기

초록

공공데이터의 개방과 제공의 활성화와 함께, 공공도서관이 업무 중에 생산한 서지 데이터와 대출 이력과 같은 데이터가 도서관 공공데이터로 제공되고 있다. 본 논문은 도서관 공공데이터의 품질을 진단하고, 그 결과를 바탕으로 도서관 공공데이터의 품질을 높일 개선방안을 제안하고자 한다. 먼저, 문헌정보학 영역에서 공공데이터에 관해 이루어진 연구를 개괄한다. 그다음으로, 도서관 공공데이터 개방 플랫폼인 도서관 정보나루의 오픈 API를 통해 확보한 도서관 공공데이터의 완전성과 정확성을 진단한다. 마지막으로, 데이터 품질 진단 결과에 바탕을 개선방안을 도출한다. 완전성을 진단한 결과, 도서의 식별과 검색을 위 필수적인 서지 요소에서 다수의 공백이 확인되었다. 정확성을 진단한 결과, 값의 유형, 값의 범위, 제한조건을 따르지 않는 부정확한 서지 요소가 확인되었다. 본 연구는 데이터 품질 진단 분석 결과를 바탕으로, 도서관 정보나루의 데이터 수집 절차 개선, 데이터별 스키마 구축, 데이터 수집과 데이터 처리에 관한 안내 제공, 원자료 공개를 제언하였다.

Abstract

With the popularization of open government data, Library-related open government data is also open and utilized to the public. The purpose of this paper is to diagnose the quality of library-related open government data and propose improvement measures to enhance the quality based on the diagnosis result. As a result of diagnosing the completeness of the data, a number of blanks are identified in the bibliographic elements essential for identifying and searching a book. As a result of diagnosing the accuracy of the data, the bibliographic elements that are not compliant with the data schema have been identified. Based on the result of data quality diagnosis, this study suggested improving the data collection procedure, establishing data set schema, providing details on data collection and data processing, and publishing raw data.

12

서지데이터 요소 채기 우선순에서 표제지의 기능성 연구

남태우(중앙대학교) 2004, Vol.21, No.1, pp.55-92 https://doi.org/10.3743/KOSIM.2004.21.1.055

초록보기

초록

본고에서는 서지데이터요소의 채기 과정에서 거의 신성권을 보장받았던 표제지의 기능성을 연구하고자 하였다. 그래서 우선적으로 서지통정상에서의 표제지의 출현배경과 그들의 개념정립을 고찰하였으며 편목과정에서 어떻게 취급하였는지도 규명하였다. 그리고 하이퍼텍스트환경에서의 표제지에 대한 탈-서지적 과정도 분석하였다.

Abstract

The title page of a book is a reliable source, since it, together with its verso, usually contains all bibliographically significant data. Generally, the title page is a page at the beginning of a book giving its title and the names of the author and publisher. Prescribing a source of information from which data elements should be derived is a way of specifying how an entity can represent itself. In simpler times, when bibliographic entities were for the most part books published in Western countries, the choice of source was obviously the title page, the "face of the book".

13

대학도서관 이용조사를 통한 경영개선 연구 - C 대학도서관 이용자의 대출기록 분석을 중심으로-

유경종(교육과학기술부) ; 박일종(계명대학교) 2007, Vol.24, No.3, pp.93-117 https://doi.org/10.3743/KOSIM.2007.24.3.093

초록보기

초록

본 논문은 C대학도서관의 학술정보시스템(LAS)에 구축되어 있는 장서와 대출기록 및 고객관련 데이터를 수집하여 이를 분석하고 그 결과를 고객관계관리(CRM)에 적용할 수 있는 방안을 제시하였다. 수집된 자료는 C 대학도서관에서 소장하고 있는 대출이 가능한 단행본 총 269,387책의서지데이타와고객 12,281명의 데이터, 이용자 대출기록 39,269건이었다. 대출기록 분석 데이터에서 관계변수로 이용자 신분, 대출빈도, 대출책수와 대출횟수, 출판년도를 추출하여 데이터 마이닝 기법으로 분석하고, 상관계수로 검증하였다.

Abstract

The books and circulation-related data in the Library Automation System(LAS) of C-academic library were collected and analyzed, and also the method which may be applied to the Customer Relationship Management (CRM) based on the results was suggested in this paper. Collected data were 269,387 bibliographic data of books, 12,281 patron data, and 39,269 circulation records. User identity, circulation frequencies, total number of circulated books, and publication year as relation factor from the analyzed data of circulation records were extracted. They were also analyzed, and verified by correlation coefficient.

14

노안 독자를 위한 큰글자도서 이용가능성 연구

장혜란(상명대학교) 2015, Vol.32, No.3, pp.341-360 https://doi.org/10.3743/KOSIM.2015.32.3.341

초록보기

초록

노령화사회에서 노인을 위한 서비스는 도서관의 새로운 도전이다. 본 연구는 노인의 독서 장애요인과 노안에 대해 살펴보고, 노인의 독서를 용이하게 만드는 한글 큰글자도서의 출판과 수집 및 이용가능성을 조사하였다. 큰글자도서의 출판과 유통에 대하여는 국립중앙도서관 소장 서지리스트와 교보문고의 재고 리스트를 기초로 하고, 공공도서관에서 접근가능한 큰글자도서는 한국도서관협회가 보급한 큰글자도서 리스트와 기존의 도서관 장애인서비스 현황조사 데이터를 기초로 하여, 이용가능한 큰글자도서의 유형, 종수와 권수, 출판연도, 주제, 중복성 등을 분석하였다. 분석결과에 따라 문제점을 식별하고 큰글자도서의 서지통정, 수집 확대, 도서관의 노인 독서 진흥 방안 및 후속 연구를 제언하였다.

Abstract

Services for the elderly is a new challenge for the libraries in aging society. This study reviewed the obstacles faced by the old readers and presbyopia, and analyzed the states of the large-print books to understand and estimate the Korean large-print books availability. Based on the bibliographic list of large-print books collected by the National Library of Korea, large-print book stock list of the Kyobo Book Center, large-print book lists supplied to the libraries by the Korean Library Association, and the data of the previous Library Survey for the Disabled, number of titles and volumes, publication year, duplication, and subject field of the large-print books available are analyzed. Based on the results, problems are identified and recommendations for bibliographic control, collection development, reading promotion, and further research area are suggested.

15

데이터 리터러시 연구 분야의 주경로와 지적구조 분석

이재윤(명지대학교 문헌정보학과) 2023, Vol.40, No.4, pp.403-428 https://doi.org/10.3743/KOSIM.2023.40.4.403

초록보기

초록

이 연구에서는 데이터 리터러시 분야 연구의 발전 경로와 지적구조 및 떠오르는 유망 주제를 파악하고자 하였다. 이를 위해서 Web of Science에서 검색한 데이터 리터러시 관련 논문은 교육학 분야와 문헌정보학 분야 논문이 전체의 60% 가까이를 차지하였다. 우선 인용 네트워크 분석에서는 페이지랭크 알고리즘을 사용해서 인용 영향력이 높은 다양한 주제의 핵심 논문을 파악하였다. 데이터 리터러시 연구의 발전 경로를 파악하기 위해서 기존의 주경로분석법을 적용해보았으나 교육학 분야의 연구 논문만 포함되는 한계가 있었다. 이를 극복할 수 있는 새로운 기법으로 페이지랭크 주경로분석법을 개발한 결과, 교육학 분야와 문헌정보학 분야의 핵심 논문이 모두 포함되는 발전 경로를 파악할 수 있었다. 데이터 리터러시 연구의 지적구조를 분석하기 위해서 키워드 서지결합 분석을 시행하였다. 도출된 키워드 서지결합 네트워크의 세부 구조와 군집 파악을 위해서 병렬최근접이웃클러스터링 알고리즘을 적용한 결과 대군집 2개와 그에 속한 소군집 7개를 파악할 수 있었다. 부상하는 유망 주제를 도출하기 위해서 각 키워드와 군집의 성장지수와 평균출판년도를 측정하였다. 분석 결과 팬데믹 상황과 AI 챗봇의 부상이라는 시대적 배경 하에서 사회정의를 위한 비판적 데이터 리터러시가 고등교육 측면에서 급부상하고 있는 것으로 나타났다. 또한 이 연구에서 연구의 발전경로를 파악하는 수단으로 새롭게 개발한 페이지랭크 주경로분석 기법은 서로 다른 영역에서 병렬적으로 발전하는 둘 이상의 연구흐름을 발견하기에 효과적이었다.

Abstract

This study investigates the development path and intellectual structure of data literacy research, aiming to identify emerging topics in the field. A comprehensive search for data literacy-related articles on the Web of Science reveals that the field is primarily concentrated in Education & Educational Research and Information Science & Library Science, accounting for nearly 60% of the total. Citation network analysis, employing the PageRank algorithm, identifies key papers with high citation impact across various topics. To accurately trace the development path of data literacy research, an enhanced PageRank main path algorithm is developed, which overcomes the limitations of existing methods confined to the Education & Educational Research field. Keyword bibliographic coupling analysis is employed to unravel the intellectual structure of data literacy research. Utilizing the PNNC algorithm, the detailed structure and clusters of the derived keyword bibliographic coupling network are revealed, including two large clusters, one with two smaller clusters and the other with five smaller clusters. The growth index and mean publishing year of each keyword and cluster are measured to pinpoint emerging topics. The analysis highlights the emergence of critical data literacy for social justice in higher education amidst the ongoing pandemic and the rise of AI chatbots. The enhanced PageRank main path algorithm, developed in this study, demonstrates its effectiveness in identifying parallel research streams developing across different fields.

16

한글 저자명 중의성 해소를 위한 기계학습기법의 적용

강인수(경성대학교) 2008, Vol.25, No.3, pp.27-39 https://doi.org/10.3743/KOSIM.2008.25.3.027

초록보기

초록

동일한 인명을 갖는 서로 다른 실세계 사람들이 존재하는 현실은 인터넷 세계에서 인명으로 표현된 개체의 신원을 식별해야 하는 문제를 발생시킨다. 상기의 문제가 학술정보 내의 저자명 개체로 제한된 경우를 저자식별이라 부른다. 저자식별은 식별 대상이 되는 저자명 개체 사이의 유사도 즉 저자유사도를 계산하는 단계와 이후 저자명 개체들을 군집화하는 단계로 이루어진다. 저자유사도는 공저자, 논문제목, 게재지정보 등의 저자식별자질들의 자질유사도로부터 계산되는데, 이를 위해 기존에 교사방법과 비교사방법들이 사용되었다. 저자식별된 학습샘플을 사용하는 교사방법은 비교사방법에 비해 다양한 저자식별자질들을 결합하는 최적의 저자유사도함수를 자동학습할 수 있다는 장점이 있다. 그러나, 기존 교사방법 연구에서는 SVM, MEM 등의 일부 기계학습기법만이 시도되었다. 이 논문은 다양한 기계학습기법들이 저자식별에 미치는 성능, 오류, 효율성을 비교하고, 공저자와 논문제목 자질에 대해 자질값 추출 및 자질 유사도 계산을 위한 여러 기법들의 비교분석을 제공한다.

Abstract

In bibliographic data, the use of personal names to indicate authors makes it difficult to specify a particular author since there are numerous authors whose personal names are the same. Resolving same-name author instances into different individuals is called author resolution, which consists of two steps: calculating author similarities and then clustering same-name author instances into different person groups. Author similarities are computed from similarities of author-related bibliographic features such as coauthors, titles of papers, publication information, using supervised or unsupervised methods. Supervised approaches employ machine learning techniques to automatically learn the author similarity function from author-resolved training samples. So far, however, a few machine learning methods have been investigated for author resolution. This paper provides a comparative evaluation of a variety of recent high-performing machine learning techniques on author disambiguation, and compares several methods of processing author disambiguation features such as coauthors and titles of papers.

17

IFLA FRAD 모형이 관련 표준에 미친 영향 연구

안영희(백석대학교 학술정보관) ; 이성숙(충남대학교) 2009, Vol.26, No.1, pp.279-303 https://doi.org/10.3743/KOSIM.2009.26.1.279

초록보기

초록

이 연구는 IFLA에서 연구되고 있는 ‘전거데이터의 기능상의 요건’(FRAD)을 FRAR에서의 변화 양상을 중심으로 명확히 이해하기 위한 것이다. 또한 FRAD가 RDA와 MARC21에 끼친 영향을 분석함으로 FRAD와 관련 규칙과의 관계를 정립하였고, 전거제어를 위한 IFLA의 활동에 비추어 국내 전거제어 관련 목록 규칙과 포맷, 주요 전거DB구축 현황을 검토하였다. 이런 분석을 토대로 국내 전거제어 표준을 위한 고려사항으로 접근점제어방식, 적용범위 확대, 개체-관계 모형과 같은 새로운 접근방식의 도입, 국가서지작성기관의 역할 강화 등을 살펴보았다. 이 연구결과는 전거제어를 위한 기초자료로 활용될 수 있을 것이다.

Abstract

This study aims to clearly understand ‘Functional Requisite of Authority Data(FRAD)’ being studied by IFLA focused on aspect of change from FRAR. In addition, it has established relationship between FRAD and concerned rules by analyzing effect of FRAD on RDA and MARC21 and reviewed cataloguing rules, format and situations of major authority DB implementations concerned about domestic authority controls in reflection of IFLA’s activities for authority control. Based on the analysis, it has looked into considerations for domestic authority controls standards such as access control methods, expansion of application scope, introduction of new approaches such as entity-relationship model, reinforcement of roles for national bibliographic agency. These study results would be utilized as basic data for authority control.

18

ChatGPT가 자동 생성한 더블린 코어 메타데이터의 품질 평가: 국내 도서를 대상으로

김선욱(경북대학교 사회과학대학 문헌정보학과) ; 이혜경(경북대학교 문헌정보학과) ; 이용구(경북대학교) 2023, Vol.40, No.2, pp.183-209 https://doi.org/10.3743/KOSIM.2023.40.2.183

초록보기

초록

이 연구의 목적은 ChatGPT가 도서의 표지, 표제지, 판권기 데이터를 활용하여 생성한 더블린코어의 품질 평가를 통하여 ChatGPT의 메타데이터의 생성 능력과 그 가능성을 확인하는 데 있다. 이를 위하여 90건의 도서의 표지, 표제지와 판권기 데이터를 수집하여 ChatGPT에 입력하고 더블린 코어를 생성하게 하였으며, 산출물에 대해 완전성과 정확성 척도로 성능을 파악하였다. 그 결과, 전체 데이터에 있어 완전성은 0.87, 정확성은 0.71로 준수한 수준이었다. 요소별로 성능을 보면 Title, Creator, Publisher, Date, Identifier, Right, Language 요소가 다른 요소에 비해 상대적으로 높은 성능을 보였다. Subject와 Description 요소는 완전성과 정확성에 대해 다소 낮은 성능을 보였으나, 이들 요소에서 ChatGPT의 장점으로 알려진 생성 능력을 확인할 수 있었다. 한편, DDC 주류인 사회과학과 기술과학 분야에서 Contributor 요소의 정확성이 다소 낮았는데, 이는 ChatGPT의 책임표시사항 추출 오류 및 데이터 자체에서 메타데이터 요소용 서지 기술 내용의 누락, ChatGPT가 지닌 영어 위주의 학습데이터 구성 등에 따른 것으로 판단하였다.

Abstract

The purpose of this study is to evaluate the Dublin Core metadata generated by ChatGPT using book covers, title pages, and colophons from a collection of books. To achieve this, we collected book covers, title pages, and colophons from 90 books and inputted them into ChatGPT to generate Dublin Core metadata. The performance was evaluated in terms of completeness and accuracy. The overall results showed a satisfactory level of completeness at 0.87 and accuracy at 0.71. Among the individual elements, Title, Creator, Publisher, Date, Identifier, Rights, and Language exhibited higher performance. Subject and Description elements showed relatively lower performance in terms of completeness and accuracy, but it confirmed the generation capability known as the inherent strength of ChatGPT. On the other hand, books in the sections of social sciences and technology of DDC showed slightly lower accuracy in the Contributor element. This was attributed to ChatGPT’s attribution extraction errors, omissions in the original bibliographic description contents for metadata, and the language composition of the training data used by ChatGPT.

19

대학도서관 상호대차 장서 프로파일 분석 연구

최원실(이화여자대학교 일반대학원 문헌정보학과) ; 정은경(이화여자대학교) 2019, Vol.36, No.3, pp.109-129 https://doi.org/10.3743/KOSIM.2019.36.3.109

초록보기

초록

대학의 재정 악화는 대학도서관 예산 삭감으로 이어지고, 특히 자료구입비 예산에 큰 영향을 끼쳤다. 이에 대한 해결책으로 대학도서관 자원공유에 관한 논의가 이루어지고 있으며, 상호대차 데이터를 분석하는 연구들이 진행되었다. 본 연구는 이러한 연구 흐름과 같이 국내 4년제 대학도서관 상호대차 장서 프로파일을 규명하고자 하였다. 이를 위해 2011년부터 2017년까지 KERIS 종합목록의 서지와 상호대차 데이터를 활용하여 상호대차 현황을 분석하였다. 그 결과로 첫째, 2011년에는 대규모 대학도서관을 중심으로 서양서 상호대차의 제공이 이루어졌으나, 2014년 이후 점차적으로 고유장서의 비율이 증가하면서 상호대차 네트워크 내 주요 권역의 범위가 확대되고, 권역 내 영향력이 증가하는 기관이 다수 출현하였다. 둘째, 2012년에는 서양서 소장 종수가 많고, 공통장서의 비율이 높을수록 상호대차 네트워크 내 영향력이 크게 나타났으나, 2016년에는 이러한 경향과 더불어 고유장서의 비중이 높을수록 제공 측면에서 영향력이 증가하였다. 셋째, 서양서 소장과 상호대차 지수에 의한 계층적 군집 분석에 따른 6개 군집의 대학도서관이 규명되었다. 이러한 연구결과는 향후 대학도서관 자원공유를 위한 정책 수립에 있어서 활용할 수 있으리라 기대한다.

Abstract

Since the recent financial crisis in universities has caused the decrease of academic library budget, the resource sharing has been considered by utilizing inter-library loan (ILL) data for solving the financial deficit. This study aims to identify the collection profiles of western monographs’ ILL data among 4-year academic libraries. In order to achieve the purpose of this study, this study analyzes ILL data from 2011 to 2017 using the bibliographic data and ILL transactions of the KERIS union catalog. The findings of the study show that the western monographs was significantly provided by large-scale academic libraries in 2011, however, the extent of major regions expanded, and the number of influential institutions rose in 2016. Second, in 2012, the influence in the ILL network increased in the quantity of western monographs holdings and the proportion of common collections. On the other hand, in 2016, it was also shown that the influence in terms of provision in the ILL network increased in the proportion of unique collections. Lastly, the ILL participating academic libraries were classified into six clusters by a hierarchical clustering analysis of holdings and ILL indexes.

20

문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구

정은경(이화여자대학교) 2007, Vol.24, No.4, pp.305-321 https://doi.org/10.3743/KOSIM.2007.24.4.305

초록보기

초록

이 연구는 색인가가 주제 색인하는 과정에서 참조하는 여러 문서구성요소를 문서 범주화의 정보원으로 인식하여 이들이 문서 범주화 성능에 미치는 영향을 살펴보는데 그 목적이 있다. 이는 기존의 문서 범주화 연구가 전문(full text)에 치중하는 것과는 달리 문서구성요소로서 정보원의 영향을 평가하여 문서 범주화에 효율적으로 사용될 수 있는지를 파악하고자 한다. 전형적인 과학기술 분야의 저널 및 회의록 논문을 데이터 집합으로 하였을 때 정보원은 본문정보 중심과 문서구성요소 중심으로 나뉘어 질 수 있다. 본문정보 중심은 본론 자체와 서론과 결론으로 구성되며, 문서구성요소 중심은 제목, 인용, 출처, 초록, 키워드로 파악된다. 실험 결과를 살펴보면, 인용, 출처, 제목 정보원은 본문 정보원과 비교하여 유의한 차이를 보이지 않으며, 키워드 정보원은 본문 정보원과 비교하여 유의한 차이를 보인다. 이러한 결과는 색인가가 참고하는 문서구성요소로서의 정보원이 문서 범주화에 본문을 대신하여 효율적으로 활용될 수 있음을 보여주고 있다.

Abstract

The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지