정보관리학회지, 한국정보관리학회

41

KONG-DB: 웹 상의 어휘 사전을 활용한 한국 소설 지명 DB, 검색 및 시각화 시스템

박성희(한남대학교) 2016, Vol.33, No.3, pp.321-343 https://doi.org/10.3743/KOSIM.2016.33.3.321

초록보기

초록

본 연구의 목적은 1) 소설 속 지명 데이터베이스(DB)를 구축하고, 2) 확장 가능한 지명 DB를 위해 자동으로 지명을 추출하여 데이터베이스를 갱신하며, 3) 데이터베이스 내의 소설지명과 용례를 검색하고 시각화하는 파일럿시스템을 구현하는 데 있다. 특히, 학습자료(training)에 해당하는 말뭉치(corpus)를 확보하기 어려운, 소설지명과 같이 현재 잘 쓰이지 않는 개체명을 자동으로 추출하는 것은 매우 어려운 문제이다. 효과적인 지명 정보 추출용 학습자료 말뭉치 확보 문제를 해결하기 위해 본 논문에서는 이미 수작업으로 구축된 웹 지식(어휘사전)을 활용하여 학습에 필요한 충분한 양의 학습말뭉치를 확보하는 방안을 적용하였다. 이렇게 확보된 학습용 코퍼스와 학습된 자동추출 모듈을 가지고, 새로운 지명 용례를 찾아 추가하는 지명 데이터베이스 확장 도구를 만들었으며, 소설지명을 지도 위에 시각화하는 시스템을 설계하였다. 또한, 시범시스템을 구현함으로써 실험적으로 그 타당성을 입증하였다. 끝으로, 현재 시스템의 보완점을 제시하였다.

Abstract

This study aimed to design a semi-automatic web-based pilot system 1) to build a Korean novel geo-name, 2) to update the database using automatic geo-name extraction for a scalable database, and 3) to retrieve/visualize the usage of an old geo-name on the map. In particular, the problem of extracting novel geo-names, which are currently obsolete, is difficult to solve because obtaining a corpus used for training dataset is burden. To build a corpus for training data, an admin tool, HTML crawler and parser in Python, crawled geo-names and usages from a vocabulary dictionary for Korean New Novel enough to train a named entity tagger for extracting even novel geo-names not shown up in a training corpus. By means of a training corpus and an automatic extraction tool, the geo-name database was made scalable. In addition, the system can visualize the geo-name on the map. The work of study also designed, implemented the prototype and empirically verified the validity of the pilot system. Lastly, items to be improved have also been addressed.

42

교육용 어학 영상의 내용 기반 특징 분석에 의한 샷 구분 및 색인에 대한 연구

한희준(경기대학교 대학원 문헌정보학과) 2017, Vol.34, No.1, pp.219-239 https://doi.org/10.3743/KOSIM.2017.34.1.219

초록보기

초록

Abstract

As IT technology develops rapidly and the personal dissemination of smart devices increases, video material is especially used as a medium of information transmission among audiovisual materials. Video as an information service content has become an indispensable element, and it has been used in various ways such as unidirectional delivery through TV, interactive service through the Internet, and audiovisual library borrowing. Especially, in the Internet environment, the information provider tries to reduce the effort and cost for the processing of the provided information in view of the video service through the smart device. In addition, users want to utilize only the desired parts because of the burden on excessive network usage, time and space constraints. Therefore, it is necessary to enhance the usability of the video by automatically classifying, summarizing, and indexing similar parts of the contents. In this paper, we propose a method of automatically segmenting the shots that make up videos by analyzing the contents and characteristics of language education videos and indexing the detailed contents information of the linguistic videos by combining visual features. The accuracy of the semantic based shot segmentation is high, and it can be effectively applied to the summary service of language education videos.

43

국내 대학 도서관 홈페이지의 웹 접근성 실태에 대한 연구

김영곤(경남대학교) ; 오창규(경남대학교) 2011, Vol.28, No.3, pp.197-217 https://doi.org/10.3743/KOSIM.2011.28.3.197

초록보기

초록

웹 접근성이란 장애에 구애 없이 모든 사람들이 차별없이 정보서비스를 받을 수 있도록 하는 것을 말한다. 본 연구에서는 국내의 153개 대학 도서관 홈페이지의 웹 접근성 실태를 살펴보고, 웹 접근성 확보를 위한 관련 법 제도와 이를 준수하기 위한 효과적인 실무 접근 방법을 제안하고자 한다. 실태조사는 2단계로 실시되었다. 1차적으로는 153개 대학의 도서관 홈페이지의 일반적 수준을 파악하기 위해 자동화 평가도구인 KADO-WAH 2.0을 사용하였다. 그리고 점검항목별 상세 점검은 1차 평가에서 준수율이 100%라고 판단된 19개 대학의 도서관 홈페이지에 대해 국한하였다. 평가 결과 국내 대학도서관 홈페이지에서 웹 접근성을 완벽하게 준수하는 사이트를 발견하지 못하였다. 따라서 향후 국내 대학 도서관에서 KWCAG의 요구사항을 충족하기 위해 본 연구결과를 참고자료로 활용할 수 있을 것이다. 특히, 음성 서비스, 화면 확대 축소, 하이라이트 기능과 같은 기본 요구사항을 우선적으로 그리고 거의 모든 콘텐츠에 실천되어야 한다고 여겨진다.

Abstract

Web accessibility refers to the practice of providing equal access to web sites to people without and with disability. This study aims to investigate the web content accessibility of 153 university libraries in Korea, and further suggest the approach to implement an effective university library web site to meet Korean Web Content Accessibility Guidelines(KWCAG). Survey was conducted with two steps. The first step was to find out the general level of web accessibility of all university library web sites using an automatic appraisal tool, KADO-WAH2.0, and the detailed examination of web accessibility check items was limited to 19 web sites which proved excellent in automatic appraisal. Regretfully, the result says that there is no perfect web site. Therefore every university library is advisable to make good use of the findings to meet all the requirements of KWCAG. In particular, the basic requirements, such as voice service, resizing and highlighting text, must be fulfilled by priority and to almost all contents.

44

이동 평균 기반 동적 시간 와핑 기법을 이용한 시계열 키워드 데이터의 분류 성능 개선 방안

정도헌(덕성여자대학교 문헌정보학과 조교수) 2019, Vol.36, No.4, pp.83-105 https://doi.org/10.3743/KOSIM.2019.36.4.083

초록보기

초록

본 연구는 시계열 특성을 갖는 데이터의 패턴 유사도 비교를 통해 유사 추세를 보이는 키워드를 자동 분류하기 위한 효과적인 방법을 제안하는 것을 목표로 한다. 이를 위해 대량의 웹 뉴스 기사를 수집하고 키워드를 추출한 후 120개 구간을 갖는 시계열 데이터를 생성하였다. 제안한 모델의 성능 평가를 위한 테스트 셋을 구축하기 위해, 440개의 주요 키워드를 8종의 추세 유형에 따라 수작업으로 범주를 부여하였다. 본 연구에서는 시계열 분석에 널리 활용되는 동적 시간 와핑(DTW) 기법을 기반으로, 추세의 경향성을 잘 보여주는 이동 평균(MA) 기법을 DTW에 추가 적용한 응용 모델인 MA-DTW를 제안하였다, 자동 분류 성능 평가를 위해 k-최근접 이웃(kNN) 알고리즘을 적용한 결과, ED와 DTW가 각각 마이크로 평균 F1 기준 48.2%와 66.6%의 최고 점수를 보인 데 비해, 제안 모델은 최고 74.3%의 식별 성능을 보여주었다. 종합 성능 평가를 통해 측정된 모든 지표에서, 제안 모델이 기존의 ED와 DTW에 비해 우수한 성능을 보임을 확인하였다.

Abstract

This study aims to suggest an effective method for the automatic classification of keywords with similar patterns by calculating pattern similarity of temporal data. For this, large scale news on the Web were collected and time series data composed of 120 time segments were built. To make training data set for the performance test of the proposed model, 440 representative keywords were manually classified according to 8 types of trend. This study introduces a Dynamic Time Warping(DTW) method which have been commonly used in the field of time series analytics, and proposes an application model, MA-DTW based on a Moving Average(MA) method which gives a good explanation on a tendency of trend curve. As a result of the automatic classification by a k-Nearest Neighbor(kNN) algorithm, Euclidean Distance(ED) and DTW showed 48.2% and 66.6% of maximum micro-averaged F1 score respectively, whereas the proposed model represented 74.3% of the best micro-averaged F1 score. In all respect of the comprehensive experiments, the suggested model outperformed the methods of ED and DTW.

45

문헌정보학 학술지를 대상으로 한 온톨로지 구축에 관한 연구

노영희(건국대학교) 2011, Vol.28, No.2, pp.177-193 https://doi.org/10.3743/KOSIM.2011.28.2.177

초록보기

초록

Abstract

This study constructed an ontology targeting journal articles and evaluated its performance. Also, the performance of a triple structure ontology was compared with the knowledge base of an inverted index file designed for a simple keyword search engine. The coverage was three years of articles published in the Journal of the Korean Society for Information Management from 2007 to 2009. Protégé was used to construct an ontology, whilst utilizing an inverted index file to compare performance. The concept ontology was manually established, and the bibliography ontology was automatically constructed to produce an OWL concept ontology and an OWL bibliography ontology, respectively. This study compared the performance of the knowledge base of the ontology, using the Jena search engine with the performance of an inverted index file using the Lucene search engine. As a result, The Lucene showed higher precision rate, but Jena showed higher recall rate.

46

지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구

이지혜(연세대학교) ; 정영미(연세대학교) 2009, Vol.26, No.3, pp.451-462 https://doi.org/10.3743/KOSIM.2009.26.3.451

초록보기

초록

본 연구에서는 의견이나 감정을 담고 있는 의견 문서들의 자동 분류 성능을 향상시키기 위하여 개념색인의 하나인 잠재의미색인 기법을 사용한 분류 실험을 수행하였다. 실험을 위해 수집한 1,000개의 의견 문서는 500개씩의 긍정 문서와 부정 문서를 포함한다. 의견 문서 텍스트의 형태소 분석을 통해 명사 형태의 내용어 집합과 용언, 부사, 어기로 구성되는 의견어 집합을 생성하였다. 각기 다른 자질 집합들을 대상으로 의견 문서를 분류한 결과 용어색인에서는 의견어 집합, 잠재의미색인에서는 내용어와 의견어를 통합한 집합, 지도적 잠재의미색인에서는 내용어 집합이 가장 좋은 성능을 보였다. 전체적으로 의견 문서의 자동 분류에서 용어색인 보다는 잠재의미색인 기법의 분류 성능이 더 좋았으며, 특히 지도적 잠재의미색인 기법을 사용할 경우 최고의 분류 성능을 보였다.

Abstract

The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

47

사전학습 된 언어 모델 기반의 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 이용한 참고문헌 메타데이터 인식 연구

지선영(경기대학교 일반대학원 문헌정보학과) ; 최성필(경기대학교 문헌정보학과) 2021, Vol.38, No.1, pp.221-242 https://doi.org/10.3743/KOSIM.2021.38.1.221

초록보기

초록

본 연구에서는 사전학습 된 언어 모델을 기반으로 양방향 게이트 순환 유닛 모델과 조건부 랜덤 필드 모델을 활용하여 참고문헌을 구성하는 메타데이터를 자동으로 인식하기 위한 연구를 진행하였다. 실험 집단은 2018년에 발행된 학술지 40종을 대상으로 수집한 PDF 형식의 학술문헌 53,562건을 규칙 기반으로 분석하여 추출한 참고문헌 161,315개이다. 실험 집합을 구축하기 위하여 PDF 형식의 학술 문헌에서 참고문헌을 분석하여 참고문헌의 메타데이터를 자동으로 추출하는 연구를 함께 진행하였다. 본 연구를 통하여 가장 높은 성능을 나타낸 언어 모델을 파악하였으며 해당 모델을 대상으로 추가 실험을 진행하여 학습 집합의 규모에 따른 인식 성능을 비교하고 마지막으로 메타데이터별 성능을 확인하였다.

Abstract

This study applied reference metadata recognition using bidirectional GRU-CRF model based on pre-trained language model. The experimental group consists of 161,315 references extracted by 53,562 academic documents in PDF format collected from 40 journals published in 2018 based on rules. In order to construct an experiment set. This study was conducted to automatically extract the references from academic literature in PDF format. Through this study, the language model with the highest performance was identified, and additional experiments were conducted on the model to compare the recognition performance according to the size of the training set. Finally, the performance of each metadata was confirmed.

48

전자책 라이브러리를 위한 메타데이터 개발에 관한 연구

하진희(숙명여자대학교) ; 임순범(숙명여자대학교) ; 김성혁(숙명여자대학교) 2003, Vol.20, No.3, pp.1-16 https://doi.org/10.3743/KOSIM.2003.20.3.001

초록보기

초록

전자책 서비스업체는 전자책에 대한 충분한 데이터를 제공하고 있지 않기 때문에 대부분의 도서관은 전자책의 목록정보를 도서관의 목록에 자동으로 다운로드할 수 없다. 본 논문은 이러한 문제점을 해결하기 위하여 다양한 메타데이터 간의 호환성 및 상호운용성을 확보하기 위하여 전자책 라이브러리를 위한 메타데이터를 개발하였다. 이를 위해 전자책 서비스업체에서 사용하는 메타데이터, KS X 6100 메타데이터, 더블린코어, MARC, TEI Header등을 비교 분석하여 공통의 메타데이터 요소들을 도출하였다. 도출된 공통의 메타데이터 요소를 핵심 기술 요소로 정의하고, 그 외 전자책 고유 특성을 나타내는 메타데이터 요소를 상세 및 추가 기술 요소로 정의하였다.

Abstract

The information about eBook cannot be added automatically to library catalog because the eBook Service Provider does not provide enough eBook metadata. Therefore, this paper was developed the eBook library's metadata in order to maintain the compatibility and interoperability among various metadata standard. In trying to accomplish this, we have comparatively analyzed the eBook service provider's metadata, KS X 6100 metadata, Dublin Core, MARC, and TEI Header, and extracted common metadata elements from them. We defined these common metadata elements as a core element and added other elements that uniquely describe eBook characteristics as a detailed and additional elements for eBook metadata.

49

컨소시엄 기반 전자저널 이용통계 수집 및 분석 개선 방안

정영임(한국과학기술정보연구원) ; 김정환(한국과학기술정보연구원) 2012, Vol.29, No.2, pp.7-25 https://doi.org/10.3743/KOSIM.2012.29.2.007

초록보기

초록

전자저널의 활용이 급속히 증가하면서 도서관에서는 자관에서 구입되는 전자저널이 얼마나, 어떻게 활용되고 있는지에 대한 관심이 증가하였다. 또한 전자정보 컨소시엄 주관기관에서도 컨소시엄 내에서 유통되는 학술자원의 이용통계에 대한 분석을 통해 국가 차원의 전자학술저널의 유통 현황 파악 및 수요자 중심의 정보수집 정책 개발이 필수적이다. 그러나 기존의 수작업에 의존한 이용통계 수집과 출판사에서 제공하는 저널 이용통계 보고서만으로는 이용에 대한 포괄적이고 심층적인 분석이 불가능하다. 이에 본 연구에서는 대용량 이용통계 수집 및 분석의 기반 마련을 위해 스크린 스크래핑과 SUSHI 프로토콜을 적용한 전자저널 이용통계 자동수집 시스템을 구현하였다. 또 저널 서지정보 및 컨소시엄 계약 데이터베이스를 연동하여 심층적인 이용통계 분석정보를 생성할 수 있는 방안을 제안하였다.

Abstract

The proliferating use of e-journals has led increasing interest in collecting and analyzing usage statistic information. However, the existing manual method and simple journal usage reports provided by publishers hinder the effective collection of large-scale usage statistics and the comprehensive/in-depth analysis on them. Thus we have proposed a hybrid automatic method of collecting e-journal usage statistics based on screen scraping and SUSHI protocol. In addition, the generation method of summary statistics presented in graphs, charts and tables has been suggested in this study. By utilizing the suggested system and analysis data, librarians can compose various reports on budget or operation of the libraries.

50

이메일에 포함된 감성정보 관련 메타데이터 추출에 관한 연구

백우진(건국대학교) 2006, Vol.23, No.2, pp.167-183 https://doi.org/10.3743/KOSIM.2006.23.2.167

초록보기

초록

본 연구는 이메일에 나타난 감성정보 메타데이터 추출에 있어 자연언어처리에 기반한 방식을 적용하였다. 투자분석가와 고객 사이에 주고받은 이메일을 통하여 개인화 정보를 추출하였다. 개인화란 이용자에게 개인적으로 의미 있는 방식으로 컨텐츠를 제공함으로써 온라인 상에서 관계를 생성하고, 성장시키고, 지속시키는 것을 의미한다. 전자상거래나 온라인 상의 비즈니스 경우, 본 연구는 대량의 정보에서 개인에게 의미 있는 정보를 선별하여 개인화 서비스에 활용할 수 있도록, 이메일이나 토론게시판 게시물, 채팅기록 등의 텍스트를 자연언어처리 기법에 의하여 자동적으로 메타데이터를 추출할 수 있는 시스템을 구현하였다. 구현된 시스템은 온라인 비즈니스와 같이 커뮤니케이션이 중요하고, 상호 교환되는 메시지의 의도나 상대방의 감정을 파악하는 것이 중요한 경우에 그러한 감성정보 관련 메타데이터를 자동으로 추출하는 시도를 했다는 점에서 연구의 가치를 찾을 수 있다.

Abstract

This paper describes a metadata extraction technique based on natural language processing (NLP) which extracts personalized information from email communications between financial analysts and their clients. Personalized means connecting users with content in a personally meaningful way to create, grow, and retain online relationships. Personalization often results in the creation of user profiles that store individuals preferences regarding goods or services offered by various e-commerce merchants. We developed an automatic metadata extraction system designed to process textual data such as emails, discussion group postings, or chat group transcriptions. The focus of this paper is the recognition of emotional contents such as mood and urgency, which are embedded in the business communications, as metadata.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지