정보관리학회지, 한국정보관리학회

1

기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구

김선우(경기대학교 문헌정보학과) ; 고건우(경기대학교 문헌정보학과) ; 최원준(한국과학기술정보연구원 콘텐츠 큐레이션센터) ; 정희석(한국과학기술정보연구원 콘텐츠 큐레이션센터) ; 윤화묵(한국과학기술정보연구원 콘텐츠큐레이션센터) ; 최성필(경기대학교) 2018, Vol.35, No.4, pp.141-164 https://doi.org/10.3743/KOSIM.2018.35.4.141

초록보기

초록

최근 학술문헌의 양이 급증하고, 융복합적인 연구가 활발히 이뤄지면서 연구자들은 선행 연구에 대한 동향 분석에 어려움을 겪고 있다. 이를 해결하기 위해 우선적으로 학술논문 단위의 분류 정보가 필요하지만 국내에는 이러한 정보가 제공되는 학술 데이터베이스가 존재하지 않는다. 이에 본 연구에서는 국내 학술문헌에 대해 다중 분류가 가능한 자동 분류 시스템을 제안한다. 먼저 한국어로 기술된 기술과학 분야의 학술문헌을 수집하고 K-Means 클러스터링 기법을 활용하여 DDC 600번 대의 중분류에 맞게 매핑하여 다중 분류가 가능한 학습집합을 구축하였다. 학습집합 구축 결과, 메타데이터가 존재하지 않는 값을 제외한 총 63,915건의 한국어 기술과학 분야의 자동 분류 학습집합이 구축되었다. 이를 활용하여 심층학습 기반의 학술문헌 자동 분류 엔진을 구현하고 학습하였다. 객관적인 검증을 위해 수작업 구축한 실험집합을 통한 실험 결과, 다중 분류에 대해 78.32%의 정확도와 72.45%의 F1 성능을 얻었다.

Abstract

Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

2

사전학습 된 언어 모델 기반의 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 이용한 참고문헌 메타데이터 인식 연구

지선영(경기대학교 일반대학원 문헌정보학과) ; 최성필(경기대학교 문헌정보학과) 2021, Vol.38, No.1, pp.221-242 https://doi.org/10.3743/KOSIM.2021.38.1.221

초록보기

초록

본 연구에서는 사전학습 된 언어 모델을 기반으로 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 활용하여 참고문헌을 구성하는 메타데이터를 자동으로 인식하기 위한 연구를 진행하였다. 실험 집단은 2018년에 발행된 학술지 40종을 대상으로 수집한 PDF 형식의 학술문헌 53,562건을 규칙 기반으로 분석하여 추출한 참고문헌 161,315개이다. 실험 집합을 구축하기 위하여 PDF 형식의 학술 문헌에서 참고문헌을 분석하여 참고문헌의 메타데이터를 자동으로 추출하는 연구를 함께 진행하였다. 본 연구를 통하여 가장 높은 성능을 나타낸 언어 모델을 파악하였으며 해당 모델을 대상으로 추가 실험을 진행하여 학습 집합의 규모에 따른 인식 성능을 비교하고 마지막으로 메타데이터별 성능을 확인하였다.

Abstract

This study applied reference metadata recognition using bidirectional GRU-CRF model based on pre-trained language model. The experimental group consists of 161,315 references extracted by 53,562 academic documents in PDF format collected from 40 journals published in 2018 based on rules. In order to construct an experiment set. This study was conducted to automatically extract the references from academic literature in PDF format. Through this study, the language model with the highest performance was identified, and additional experiments were conducted on the model to compare the recognition performance according to the size of the training set. Finally, the performance of each metadata was confirmed.

3

팩터그래프 모델을 이용한 연구전선 구축: 생의학 분야 문헌을 기반으로

김혜진(연세대학교) ; 송민(연세대학교) 2017, Vol.34, No.1, pp.177-195 https://doi.org/10.3743/KOSIM.2017.34.1.177

초록보기

초록

연구전선이란 연구논문들 간에 인용이 빈번하게 발생하며, 지속적으로 발전이 이루어지고 있는 연구영역을 의미한다. 연구행위가 집중되는 핵심 연구분야로 발전 가능성이 높은 연구전선을 조기에 예측해내는 것은 학계와 산업계, 정부기관, 나아가 국가의 과학기술 발전에 큰 유익을 가져다 줄 수 있는 유용한 사회적 자원이 된다. 본 연구는 복합자질을 활용하여 연구전선을 추론하는 모델을 제시하고자 시도하였다. 연구전선 추론은 핵심 연구영역으로 발전할 가능성이 높은 문헌들이 포함될 수 있도록 문헌을 복합자질로 표현하고, 그 자질들을 심층학습하여 새로 발행된 문헌들이 연구전선에 포함될 수 있는지 그 가능성을 예측하였다. 서지 자질, 네트워크 자질, 내용 자질 등 복합자질 세트를 사용하여 문헌을 표현하고 피인용을 많이 받을 가능성이 있는 문헌을 추론하기 위해서 확률기반 팩터그래프 모델을 적용하였다. 추출된 자질들은 팩터그래프의 변수로 표현되어 합-곱 알고리즘과 접합 트리 알고리즘을 적용하여 연구전선 추론이 이루어졌다. 팩터그래프 확률모델을 적용하여 연구전선을 추론․구축한 결과, 서지결합도 4 이상으로 구축된 베이스라인 연구전선과 큰 차이를 보였다. 팩터그래프 기반 연구전선그룹이 서지결합 기반 연구전선그룹보다 문헌 간의 직접 연결정도가 강하며 연결 관계에 있지 않은 두 개의 문헌을 연결시키는 매개정도 또한 강한 집단으로 나타났다.

Abstract

This study attempts to infer research fronts using factor graph model based on heterogeneous features. The model suggested by this study infers research fronts having documents with the potential to be cited multiple times in the future. To this end, the documents are represented by bibliographic, network, and content features. Bibliographic features contain bibliographic information such as the number of authors, the number of institutions to which the authors belong, proceedings, the number of keywords the authors provide, funds, the number of references, the number of pages, and the journal impact factor. Network features include degree centrality, betweenness, and closeness among the document network. Content features include keywords from the title and abstract using keyphrase extraction techniques. The model learns these features of a publication and infers whether the document would be an RF using sum-product algorithm and junction tree algorithm on a factor graph. We experimentally demonstrate that when predicting RFs, the FG predicted more densely connected documents than those predicted by RFs constructed using a traditional bibliometric approach. Our results also indicate that FG-predicted documents exhibit stronger degrees of centrality and betweenness among RFs.

4

대학생의 학습공간 선택에 영향을 미치는 요인에 관한 연구: 대학도서관의 효과적인 공간 구성을 위한 제언

이나리(연세대학교 교육대학원 사서교육전공) ; 박지홍(연세대학교) 2022, Vol.39, No.2, pp.61-86 https://doi.org/10.3743/KOSIM.2022.39.2.061

초록보기

초록

본 연구의 목적은 물리적 환경의 품질을 측정하는 도구로서의 서비스스케이프 개념을 이용하여 학습공간의 서비스스케이프 요인이 이용자 만족과 지속의도에 미치는 영향과 학습활동유형의 조절효과를 확인하는데 있다. 선행연구 및 심층면담을 통해 청결성, 쾌적성, 편의성, 심미성, 접근성, 유연성의 6개의 학습공간 서비스스케이프 요인을 선정하였고, 수도권 지역의 대학생을 대상으로 설문조사를 진행하였다. 연구 결과, 청결성, 쾌적성, 편의성과 접근성 요인이 이용자 만족에 유의한 영향을, 이용자 만족은 지속의도에 유의한 영향을 미치는 것으로 나타났다. 또한 학습활동유형은 청결성, 쾌적성 요인과 이용자 만족 관계에 부(-)적 조절효과가 있는 것으로 나타났다. 본 연구는 학습공간으로의 대학도서관에 대한 이용자 만족을 높이는 물리적인 환경 구성의 기초자료를 제공하였다는데 의의를 둔다.

Abstract

The purpose of this study is to investigate the effect of learning space Servicescape on the user satisfaction level and continuance intention and to identify moderating effect of the learning activity. The six Servicescape factors are selected after literature review and in-depth interviews; cleanliness, comfort, convenience, aesthetics, accessibility, and flexibility. The online survey is given to the university students at four-year private universities in Seoul metropolitan area. The result shows that among the learning space Servicescape factors, cleanliness, comfort, convenience, and accessibility have a significant impact on the user’s satisfaction and the user’s satisfaction response determines the continuance intention to the learning space. It is also found that the factors of cleanliness and comfort have a negative moderating effect on user satisfaction. This study implies that the result provides methods to develop the space arrangement for university libraries that provide the better-support to students’ learning experience.

5

교과 교사의 2015 개정 교육과정 적용과 인식이 교육정보요구에 미치는 영향: 고등학교 공통 과목을 중심으로

계민정(연세대학교 교육대학원) ; 김기영(연세대학교) 2019, Vol.36, No.1, pp.169-190 https://doi.org/10.3743/KOSIM.2019.36.1.169

초록보기

초록

본 연구는 고등학교 공통 과목을 중심으로 교과 교사들의 2015 개정 교육과정 현장 적용 실태와 인식이 교육정보요구에 미치는 영향을 확인하는 데 목적이 있다. 이를 위하여 인천광역시 일반계 공립 고등학교에 재직 중인 국어, 수학, 영어, 사회, 과학 교과(군) 교사를 대상으로 개별 심층 면담과 설문조사를 실시하였다. 분석 결과, 2015 개정 교육과정 현장 적용 실태와 인식은 교육정보요구에 일부 유의한 영향을 미치고 있음을 확인하였다. 특히 교육정보요구 부분에서 2015 개정 교육과정 적용에 따른 새로운 복본 요구 형태와 학습자료 정보원 요구 양태를 발견하였으며, 이를 바탕으로 지역적 범위 내 소규모 학교도서관 컨소시엄 형성과 레퍼럴 서비스 제공, 게이트웨이 역할 수행 기능 강화 등의 학교도서관 운영 방안을 제안하였다.

Abstract

This study aims to identify the effects of teachers’ recognition and application of 2015 revised national curriculum on their educational information needs in high schools. Several in-depth interviews and a questionnaire survey with the teachers, who were in charge of teaching common courses, such as Korean language, mathematics, English, social studies, and science, in general public high schools in Incheon, were executed for the purpose. As a result, the teachers’ recognition and application affected their educational information needs in part. Especially, new demands on small sized copies and learning information sources were identified which were related to the application of 2015 revised national curriculum. Based on the results, we proposed several improvements of school library operations, such as small sized local consortium for sharing resources and providing referral services, in order to strengthen the gateway role of school libraries.

6

과학영재학교 학생들의 정보요구 및 정보이용행태에 관한 연구

박해인(연세대학교 교육대학원) ; 이지연(연세대학교) 2023, Vol.40, No.2, pp.33-57 https://doi.org/10.3743/KOSIM.2023.40.2.033

초록보기

초록

본 연구는 과학영재학교 재학생을 대상으로 심층면담을 실시하여 정보요구와 정보이용행태를 분석하는데 목적이 있다. 선행연구를 바탕으로 연구를 설계하고, 전국 8개의 과학영재학교 중 6개 학교에 재학 중인 10명의 학생들을 대상으로 반구조화된 면담을 진행하여 정보요구와 정보이용행태 전반을 탐색하였다. 과학영재학교 학생들의 정보요구를 교과 활동과 교과 외 활동 영역으로 확인할 수 있었고, 학생들의 주요 관심 주제인 수업 및 학습, 연구 활동에서의 정보이용행태를 ISP 모형 기반으로 살펴보았다. 정보 이용의 전 과정에서 선호정보원을 파악하고, 이를 종합하여 과학영재학교 학생들의 정보이용행태의 특이점과 시사점을 논의하였다. 본 연구는 영재학교 도서관 연구를 위한 기초자료로 사용되며, 과학 주제 분야에 심화적인 관심과 재능이 있는 학생들을 위한 서비스를 제공하기 위한 자료로도 활용할 수 있는 점에서 그 의의를 찾을 수 있다.

Abstract

This study aims to analyze students’ information needs and information-seeking behavior at science schools for gifted through in-depth interviews. The research design was conducted based on previous studies. Through in-depth interviews, this study examined ten students from six out of eight science schools for the gifted in Korea for information needs and overall information-seeking behavior. The results showed the information needs of students at science schools for gifted in the areas of curricular and extracurricular activities as well as the information-seeking behavior in teaching, learning, and research activities, which were the main topics of interest to students based on the ISP model. Based on these results, we identified the preferred information sources in the information-seeking process and discussed the peculiarities and implications of students’ information-seeking behavior. The research is meaningful as it can be used as a basis for further research on the science school for gifted library and as a resource for providing services for students with deep interests and talents in science subject areas.

7

BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축

고영수(연세대학교 문헌정보학과 석사과정) ; 이수빈(연세대학교 문헌정보학과 박사과정) ; 차민정(연세대학교 소셜오믹스 연구센터) ; 김성덕(연세대학교 문헌정보학과 석사과정) ; 이주희(연세대학교 문헌정보학과 석사과정) ; 한지영(연세대학교 문헌정보학과 석사과정) ; 송민(연세대학교 문헌정보학과) 2022, Vol.39, No.2, pp.111-129 https://doi.org/10.3743/KOSIM.2022.39.2.111

초록보기

초록

불면증은 최근 5년 새 환자가 20% 이상 증가하고 있는 현대 사회의 만성적인 질병이다. 수면이 부족할 경우 나타나는 개인 및 사회적 문제가 심각하고 불면증의 유발 요인이 복합적으로 작용하고 있어서 진단 및 치료가 중요한 질환이다. 본 연구는 자유롭게 의견을 표출하는 소셜 미디어 ‘Reddit’의 불면증 커뮤니티인 ‘insomnia’를 대상으로 5,699개의 데이터를 수집하였고 이를 국제수면장애분류 ICSD-3 기준과 정신의학과 전문의의 자문을 받은 가이드라인을 바탕으로 불면증 경향 문헌과 비경향 문헌으로 태깅하여 불면증 말뭉치를 구축하였다. 구축된 불면증 말뭉치를 학습데이터로 하여 5개의 딥러닝 언어모델(BERT, RoBERTa, ALBERT, ELECTRA, XLNet)을 훈련시켰고 성능 평가 결과 RoBERTa가 정확도, 정밀도, 재현율, F1점수에서 가장 높은 성능을 보였다. 불면증 소셜 데이터를 심층적으로 분석하기 위해 기존에 많이 사용되었던 LDA의 약점을 보완하며 새롭게 등장한 BERTopic 방법을 사용하여 토픽 모델링을 진행하였다. 계층적 클러스터링 분석 결과 8개의 주제군(‘부정적 감정’, ‘조언 및 도움과 감사’, ‘불면증 관련 질병’, ‘수면제’, ‘운동 및 식습관’, ‘신체적 특징’, ‘활동적 특징’, ‘환경적 특징’)을 확인할 수 있었다. 이용자들은 불면증 커뮤니티에서 부정 감정을 표현하고 도움과 조언을 구하는 모습을 보였다. 또한, 불면증과 관련된 질병들을 언급하고 수면제 사용에 대한 담론을 나누며 운동 및 식습관에 관한 관심을 표현하고 있었다. 발견된 불면증 관련 특징으로는 호흡, 임신, 심장 등의 신체적 특징과 좀비, 수면 경련, 그로기상태 등의 활동적 특징, 햇빛, 담요, 온도, 낮잠 등의 환경적 특징이 확인되었다.

Abstract

Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from ‘insomnia’, a community on ‘Reddit’, a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups (‘Negative emotions’, ‘Advice and help and gratitude’, ‘Insomnia-related diseases’, ‘Sleeping pills’, ‘Exercise and eating habits’, ‘Physical characteristics’, ‘Activity characteristics’, ‘Environmental characteristics’) could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지