정보관리학회지, 한국정보관리학회

41

BERTopic을 활용한 불면증 소셜 데이터 토픽 모델링 및 불면증 경향 문헌 딥러닝 자동분류 모델 구축

고영수(연세대학교 문헌정보학과 석사과정) ; 이수빈(연세대학교 문헌정보학과 박사과정) ; 차민정(연세대학교 소셜오믹스 연구센터) ; 김성덕(연세대학교 문헌정보학과 석사과정) ; 이주희(연세대학교 문헌정보학과 석사과정) ; 한지영(연세대학교 문헌정보학과 석사과정) ; 송민(연세대학교 문헌정보학과) 2022, Vol.39, No.2, pp.111-129 https://doi.org/10.3743/KOSIM.2022.39.2.111

초록보기

초록

불면증은 최근 5년 새 환자가 20% 이상 증가하고 있는 현대 사회의 만성적인 질병이다. 수면이 부족할 경우 나타나는 개인 및 사회적 문제가 심각하고 불면증의 유발 요인이 복합적으로 작용하고 있어서 진단 및 치료가 중요한 질환이다. 본 연구는 자유롭게 의견을 표출하는 소셜 미디어 ‘Reddit’의 불면증 커뮤니티인 ‘insomnia’를 대상으로 5,699개의 데이터를 수집하였고 이를 국제수면장애분류 ICSD-3 기준과 정신의학과 전문의의 자문을 받은 가이드라인을 바탕으로 불면증 경향 문헌과 비경향 문헌으로 태깅하여 불면증 말뭉치를 구축하였다. 구축된 불면증 말뭉치를 학습데이터로 하여 5개의 딥러닝 언어모델(BERT, RoBERTa, ALBERT, ELECTRA, XLNet)을 훈련시켰고 성능 평가 결과 RoBERTa가 정확도, 정밀도, 재현율, F1점수에서 가장 높은 성능을 보였다. 불면증 소셜 데이터를 심층적으로 분석하기 위해 기존에 많이 사용되었던 LDA의 약점을 보완하며 새롭게 등장한 BERTopic 방법을 사용하여 토픽 모델링을 진행하였다. 계층적 클러스터링 분석 결과 8개의 주제군(‘부정적 감정’, ‘조언 및 도움과 감사’, ‘불면증 관련 질병’, ‘수면제’, ‘운동 및 식습관’, ‘신체적 특징’, ‘활동적 특징’, ‘환경적 특징’)을 확인할 수 있었다. 이용자들은 불면증 커뮤니티에서 부정 감정을 표현하고 도움과 조언을 구하는 모습을 보였다. 또한, 불면증과 관련된 질병들을 언급하고 수면제 사용에 대한 담론을 나누며 운동 및 식습관에 관한 관심을 표현하고 있었다. 발견된 불면증 관련 특징으로는 호흡, 임신, 심장 등의 신체적 특징과 좀비, 수면 경련, 그로기상태 등의 활동적 특징, 햇빛, 담요, 온도, 낮잠 등의 환경적 특징이 확인되었다.

Abstract

Insomnia is a chronic disease in modern society, with the number of new patients increasing by more than 20% in the last 5 years. Insomnia is a serious disease that requires diagnosis and treatment because the individual and social problems that occur when there is a lack of sleep are serious and the triggers of insomnia are complex. This study collected 5,699 data from ‘insomnia’, a community on ‘Reddit’, a social media that freely expresses opinions. Based on the International Classification of Sleep Disorders ICSD-3 standard and the guidelines with the help of experts, the insomnia corpus was constructed by tagging them as insomnia tendency documents and non-insomnia tendency documents. Five deep learning language models (BERT, RoBERTa, ALBERT, ELECTRA, XLNet) were trained using the constructed insomnia corpus as training data. As a result of performance evaluation, RoBERTa showed the highest performance with an accuracy of 81.33%. In order to in-depth analysis of insomnia social data, topic modeling was performed using the newly emerged BERTopic method by supplementing the weaknesses of LDA, which is widely used in the past. As a result of the analysis, 8 subject groups (‘Negative emotions’, ‘Advice and help and gratitude’, ‘Insomnia-related diseases’, ‘Sleeping pills’, ‘Exercise and eating habits’, ‘Physical characteristics’, ‘Activity characteristics’, ‘Environmental characteristics’) could be confirmed. Users expressed negative emotions and sought help and advice from the Reddit insomnia community. In addition, they mentioned diseases related to insomnia, shared discourse on the use of sleeping pills, and expressed interest in exercise and eating habits. As insomnia-related characteristics, we found physical characteristics such as breathing, pregnancy, and heart, active characteristics such as zombies, hypnic jerk, and groggy, and environmental characteristics such as sunlight, blankets, temperature, and naps.

42

기록 관리 메타데이터의 개념 모델링

이현실(원광대학교) ; 한성국(원광대학교) 2006, Vol.23, No.3, pp.23-48 https://doi.org/10.3743/KOSIM.2006.23.3.023

초록보기

초록

기록 관리 메타데이터 스키마는 기록물 자체에 내재한 정보 요소뿐만 아니라, 기록 업무에 따른 기록물의 생명 주기 관리 등에 필요한 관리 요소를 표현할 수 있는 강고한 구조를 가져야 한다. 이를 위해서 메타데이터 스키마에서는 기록 도메인의 정보 모델과, 기록 관리 업무 및 응용에서 요구되는 의미 상세화와 데이터 요소 특수화 등을 지원하는 메타데이터 프레임워크가 요구된다. 본 연구에서는 메타데이터 스키마의 주요 원리와 특성을 분석하여, 기록 관리 메타데이터 스키마를 체계적이고 효과적으로 개발하기 위한 접근 방식을 제시한다. 이를 위해 ISO 15489와 23081에 제시된 기록 관리 지침과 메타데이터 운용에 근거한 기록 관리 정보 모델을 개발하고 핵심 데이터 요소를 제시하였으며, 기록 관리 프레임워크를 구현하는 방법을 보였다.

Abstract

Record management metadata schema should have robust structure to represent not only elements innate in records itself but also management elements for the life cycle of records according to business activities. To realize these requirement, Information model for record domain is needed and also Metadata framework supporting semantic refinement and data element specialization required in record management business or applications are required. This study analyse main principles and characteristics of metadata scheme, and then suggested a novel method to develope schema systematically and effectively. This study propose information model and set of core data elements of records management based on ISO 15489 and 230381, and show how to implement the record management framework.

43

의미거리측정방법을 활용한 분산 온톨로지 간 자동 정렬 방법 연구

황상규(홍익대학교 컴퓨터공학과) ; 변영태(홍익대학교) 2009, Vol.26, No.4, pp.319-336 https://doi.org/10.3743/KOSIM.2009.26.4.319

초록보기

초록

시멘틱 웹은 현재의 월드와이드웹의 진화된 모습으로 컴퓨터와 인간이 서로 협업할 수 있도록 컴퓨터가 이해할 수 있는 지식데이터베이스인 온톨로지 기술을 활용한다. 그러나, 온톨로지를 활용하여 정보의 의미를 이해하고 처리 가능하도록 데이터의 표현형식이 표준화 되더라도, 각기 다른 개발자가 서로 다른 개념하에 구축한 온톨로지를 기반으로 작성된 데이터는 상호 불일치 문제를 유발할 수 있다. 따라서, 서로 다른 개념 하에 구축된 온톨로지 간에는 상호 서로 다른 온톨로지 간 정렬작업이 필요하다. 서로 다른 온톨로지 개념노드 간 자동화 처리된 의미정렬 시 인간전문가가 참으로 판단한 사실을 거짓으로 잘못 판단하는 문제상황(false negative)에 의해 정렬오류문제가 발생하게 되는데, 본 연구에서는 서로 다른 온톨로지 개념노드 간 의미정렬과정에서 발생하는 false negative 오류를 최소화 할 수 있는 알고리즘을 새롭게 개발, 제시하였다.

Abstract

Semantic web technology is the evolution of current World Wide Web including a machine-understandable knowledge database, ontology, it may be enable machine and people to work together. However, problems arise when we try to communicate with different data, which are annotated by different ontologies created by different people with different concepts. Thus, to communicate between ontologies, it needs to align between heterogeneous ontologies. When it is aligned between concept nodes of heterogeneous ontologies, one of main problems is a misalignment situation caused by false negative of automatic ontology mapping. So, in this paper, we present a new method to minimize the false negative error in the process of aligning concept nodes of different ontology.

44

조건부가치측정법(Contingent Valuation Methods)을 적용한 공공도서관 가치의 비교 연구: 지불수단을 중심으로

표순희(성균관대학교) 2012, Vol.29, No.2, pp.173-191 https://doi.org/10.3743/KOSIM.2012.29.2.173

초록보기

초록

본 연구는 도서관의 가치측정에 일반적으로 사용하고 있는 CVM(Contingent Valuation Method) 설계 시 필요한 지불수단이 가치금액에 미치는 영향을 분석하기 위해 수행되었다. CVM은 가상의 상황, 질문 유형, 지불 수단 등 다양한 세부적인 설계에 따라 측정값의 편의가 발생하기 때문에 이에 대한 검증이 요구되는데 특히, 지불수단은 해당 재화의 가치를 표현하는 메카니즘으로 가치에 큰 영향을 미친다. 이에 세금, 기부금, 이용요금이라는 세 가지 유형의 지불수단으로 동일한 공공도서관의 이용가치를 측정하였다. 측정 결과, 기부금은 개인이 월 14,542.3원을 지불할 의향을 보여 가장 높은 가치를 나타냈고 세금은 8,577.5원 지불의향이 있는 것으로 나타났다. 이용요금은 1회 방문에 1,612.7원을 지불할 의향이 있어 가장 낮게 측정되었으나 이를 월 단위로 할 경우 세금과 유사한 수준으로 나타났다.

Abstract

CVM (Contingent Valuation Method) has been most widely used for valuation of public libraries. However, there have been a debate on the validity of CVM in that many kind of biases could exist due to its hypothetical nature, the type of questions, payment vehicles and so on. To ensure the validity and reliability of public library valuation, this study analyzed the effects of payment vehicles to valuation using CVM. Three types of payment vehicle, tax, donation, fee were used to pay in hypothetical market. As a result, these payment vehicles estimated the different WTP and donation produced 14,542 won, which is the highest WTP.

45

영상 초록 구현을 위한 키프레임 추출 알고리즘의 설계와 성능 평가

김현희(명지대학교) 2008, Vol.25, No.4, pp.131-148 https://doi.org/10.3743/KOSIM.2008.25.4.131

초록보기

초록

본 연구에서는 비디오의 의미를 잘 표현하고 있는 키프레임들을 추출하는 알고리즘을 설계하고 평가하였다. 구체적으로 영상 초록의 키프레임 선정을 위한 이론 체계를 수립하기 위해서 선행 연구와 이용자들의 키프레임 인식 패턴을 조사하여 분석해 보았다. 그런 다음 이러한 이론 체계를 기초로 하여 하이브리드 방식으로 비디오에서 키프레임을 추출하는 알고리즘을 설계한 후 실험을 통해서 그 효율성을 평가해 보았다. 끝으로 이러한 실험 결과를 디지털 도서관과 인터넷 환경의 비디오 검색과 브라우징에 활용할 수 있는 방안을 제안하였다.

Abstract

The purposes of the study are to design a key-frame extraction algorithm for constructing the virtual storyboard surrogates and to evaluate the efficiency of the proposed algorithm. To do this, first, the theoretical framework was built by conducting two tasks. One is to investigate the previous studies on relevance and image recognition and classification. Second is to conduct an experiment in order to identify their frames recognition pattern of 20 participants. As a result, the key-frame extraction algorithm was constructed. Then the efficiency of proposed algorithm(hybrid method) was evaluated by conducting an experiment using 42 participants. In the experiment, the proposed algorithm was compared to the random method where key-frames were extracted simply at an interval of few seconds(or minutes) in terms of accuracy in summarizing or indexing a video. Finally, ways to utilize the proposed algorithm in digital libraries and Internet environment were suggested.

46

문헌간 유사도를 이용한 SVM 분류기의 문헌분류성능 향상에 관한 연구

이재윤(경기대학교) 2005, Vol.22, No.3, pp.261-287 https://doi.org/10.3743/KOSIM.2005.22.3.261

초록보기

초록

이 논문의 목적은 SVM(지지벡터기계) 분류기의 성능을 문헌간 유사도를 이용해서 향상시키는 것이다. 는 문헌 벡터 자질 표현에 기반한 SVM 문헌자동분류를 제안하였다. 제안한 방식은 분류 자질로 색인어 대신 문헌 벡터를, 자질값으로 가중치 대신 벡터유사도를 사용한다. 제안한 방식에 대한 실험 결과, SVM 분류기의 성능을 향상시킬 수 있었다. 실행 효율 향상을 위해서 문헌 벡터 자질 선정 방안과 범주 센트로이드 벡터를 사용하는 방안을 제안하였다. 실험 결과 소규모의 벡터 자질 집합만으로도 색인어 자질을 사용하는 기존 방식보다 나은 성능을 얻을 수 있었다.

Abstract

The purpose of this paper is to explore the ways to improve the performance of SVM(Support Vector Machines) text classifier using inter-document similarit ies. SVMs are powerful machine technique for automatic document classification. In this paper text categorization via SVMs aproach based on feature representation with document vectors is suggested. In this appr oach, document vectors instead stead of term weights are used as feature values. Experiments show that SVM clasifier with do cument vector features can improve the document classification performance. For the sake o f run-time efficiency, two methods are developed: One is to select document vector feature s, and the other is to use category centroid vector features instead. Experiments on these two methods show that we the performance of conventional methods with index term features.

47

온톨로지 기반 상황인지 모델링 연구: u-Convention을 중심으로

김성혁(숙명여자대학교) 2011, Vol.28, No.3, pp.123-139 https://doi.org/10.3743/KOSIM.2011.28.3.123

초록보기

초록

유비쿼터스 컴퓨팅의 주요 기술인 상황인지는 환경을 구성하는 다양한 종류의 정보 기기로부터 전달되는 상황 정보를 이해하고 처리하며, 다양한 도메인에 유연하게 적용할 수 있는 상황인지 모델을 필요로 한다. 시맨틱 웹 기술 기반의 온톨로지는 구조화된 공통의 포맷을 이용하고 의미적인 정보의 표현이 가능하므로, 시스템이 상황 정보를 공유하고 이해, 추론함으로써 효과적인 상황인지가 가능하다. 따라서 온톨로지를 이용한 상황인지 모델이 여러 연구에서 제시되어 왔는데, 본 논문에서는 이러한 기존 연구들에 대한 분석을 바탕으로 상황인지 모델의 범용성과 확장성을 위해 온톨로지의 구조를 계층화하고 이를 기반으로 상황인지 시스템을 구현하여 실제 u-Convention 도메인에 적용하였다. 또한 OWL-DL의 기술논리와 SWRL 규칙 추론을 결합함으로써 복합적인 상황을 효과적으로 추론하는 방법을 제시하였다.

Abstract

Context-awareness as a key technology of ubiquitous computing needs a context model that understands and processes situational information coming from diverse sensors and devices, and can be applied diversely in various domains. Semantic web based ontologies use structured standard format and express meaning of information, so it is possible to recognize effectively context-awareness situations, allowing the system to share information and understand situation by inference. In this paper, we propose a layered ontology model to support generality and scaleability of the context-awareness system, and applied the model to u-Convention domain. In addition, we propose a effective reasoning method to handle compound situation by combining OWL-DL and SWRL rules.

48

FRBR 모형에 기반한 서지정보 인터페이스 개발에 관한 연구

서은경(한성대학교) 2006, Vol.23, No.4, pp.317-339 https://doi.org/10.3743/KOSIM.2006.23.4.317

초록보기

초록

디지털 정보환경 속에서 동일한 내용이지만 형태와 표현 방식이 다른 저작들이 빈번하게 생성되어지자, IFLA는 다양한 매체, 응용 및 기능을 수용하고 이용자의 정보탐색 욕구를 충족시킬 수 있는 새로운 서지정보 기술 권고안인 FRBR(Functional Requirements for Bibliographic Records) 모형을 제안하였다. 이에 따라 여러 기관에서 이용자들이 보다 용이하게 원하는 정보를 탐색, 식별, 선정, 획득하고 또 항해할 수 있게 하는 FRBR 모형 기반의 서지정보시스템을 실험적으로 구현하고 있다. 본 연구는 앞으로 이러한 시스템이 본격적으로 개발될 때 도움이 될 수 있는 서지정보 인터페이스 개발방안을 제안하였다. 이를 위하여 먼저 새로운 검색 및 디스플레이 인터페이스를 선보이고 있는 FRBR 모형 기반 서지정보시스템의 전체적 특징을 살펴본 후, 각 시스템이 제공하는 탐색 인터페이스와 디스플레이 인터페이스를 각각 비교ㆍ분석하였다.

Abstract

New concept of bibliographic data and its scheme are needed to accommodate a change resulting from the emergence of new forms of electronic publishing, and the advent of networked access to information resources. FRBR model was developed for defining functions performed by the bibliographic data with respect to various media, various applications, and various user needs. Several institutions including OCLC and RLG or vendors have tried to implement the FRBR on OPAC systems. The purpose of this study is to propose the strategies for developing bibliographic interface based the FRBR model. This study is to review the representative FRBRized systems and compare the systems regarding on search interface and display interface.

49

자동요약시스템 구축에 대한 연구 - 웹 상의 보도기사를 중심으로 -

이태영(전북대학교) 2006, Vol.23, No.4, pp.41-67 https://doi.org/10.3743/KOSIM.2006.23.4.041

초록보기

초록

웹의 보도기사에 관한 자동요약시스템을 구축하기 위하여 담화구조와 지식기반 기법을 적용한 글구조 프레임과 제 규칙들을 작성하였다. 프레임에는 문단과 문장 및 절의 역할, 문단과 문장의 성질, 역할을 구분하는 판별규칙, 주요문장 발췌규칙, 그리고 요약문작성규칙 슬롯이 포함되었다. 문맥정의, 고유명사 등을 안내하는 ‘if-needed'와 변화된 슬롯 값을 알려주는 if-changed 패싯도 구비되었다. 슬롯이나 패싯의 실제 값들을 추출 표현하는 과정에서 문구의 수사적 역할과 단어 최상위 범주 및 줄거리 단위를 참조하였다. 의미흐름의 연결성을 유지하면서 요약 문장들을 통합, 분리, 합성하는 재구성은 유사도공식, 구문정보, 담화구조와 지식기반 방법에서 도출한 제 규칙 및 문맥정의를 이용하였고 비평과 같은 새로운 문장을 생성하였다.

Abstract

The writings frame and various rules based on discourse structure and knowledge-based methods were applied to construct the automatic Ext/Sums (extracts & summaries) system from the straight news in web. The frame contains the slot and facet represented by the role of paragraphs, sentences, and clauses in news and the rules determining the type of slot. Rearrangement like Unification, separation, and synthesis of the candidate sentences to summary, maintaining the coherence of meanings, were also used the rules derived from similar degree measurement, syntactic information, discourse structure, and knowledge-based methods and the context plots defined with the syntactic/semantic signature of noun and verb and category of verb suffix. The critic sentence were tried to insert into summary

50

지리정보시스템을 활용한 공공도서관 마케팅

이성신(경북대학교) 2011, Vol.28, No.3, pp.179-195 https://doi.org/10.3743/KOSIM.2011.28.3.179

초록보기

초록

본 연구의 목적은 지리정보시스템(GIS)이 공공도서관 자료선정과 서비스개발에 갖는 의미를 마케팅적 시각에서 탐색해보고자 하는 데 있다. 지리정보시스템이란 지리적 정보를 수집하고, 조작해서 표현해낼 수 있는 컴퓨터시스템이다. 지리정보시스템을 통해, 공공도서관은 지역사회의 교통관련정보, 정치적, 법적, 인구 통계적, 경제적, 사회적, 문화적, 교육적 정보를 수집하는 것이 가능하다. 따라서 공공도서관은 지리정보시스템을 마케팅의 첫 단계인 시장조사 즉 이용자분석에 활용함으로써 이용자의 요구에 부합되는 자료선정과 서비스개발을 할 수 있을 것이다. 이는 또한 이용자와의 지속적 관계형성이라는 마케팅의 최종목적을 달성하는데도 도움이 될 것이다.

Abstract

The purpose of this study is to investigate the implications that GIS(Geographic Information Systems) can have in public library collection selection and service development from a marketing perspective. GIS is a computer system capable of assembling, storing, manipulating, and displaying geographically referenced information. Through the understanding and utilization of GIS, we can collect geographical, transportation, political, legal, demographic, economic, social, cultural, educational, and recreational information of the community. Public libraries can utilize GIS for market research, including customer analysis to select library collection and develop library service based on library users' needs. As a result, public libraries can find a way to make a lasting relationship with users which is the final goal of marketing activities.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지