정보관리학회지, 한국정보관리학회

1

남궁 황(합동참모본부) 2004, Vol.21, No.1, pp.231-251 https://doi.org/10.3743/KOSIM.2004.21.1.231

초록보기

초록

본 연구는 공공기관에서 생산되는 기록물로서 가장 일반적이고 대표적인 공문서를 효율적으로 관리하고 활용할 수 있도록 생산단계에서 메타데이터 정보를 획득, 관리하는데 그 목적이 있다. 공문서는 행정정보의 근원이면서 생산기관의 의사결정을 표현하고 실현하는 핵심주체로서, 생산된 공문서는 체계적으로 관리함과 동시에 효율적으로 활용할 수 있는 시스템이 구축되어야 한다. 이를 위해 공문서 관련 주요 양식의 구조 분석을 통해 항목별로 관련 데이터 요소를 추출하였으며, 추출된 요소는 국제표준기록물 기술의 데이터 요소와 상호 비교, 분석하여 공문서의 생산배경 및 의도, 특징 등이 충분히 반영된 공문서 메타데이터 요소를 영역별로 구분하여 선정하였다. 이는 향후 우리나라 환경에 적합한 표준화된 기록물 메타데이터를 구축하는데 유용한 기초자료로 활용할 수 있을 것이다.

Abstract

This study aims to collect and manage in the step of creation metadata information to effectively manage and use official document which is a typical and normal records. To do it, data elements are extracted through analyzing structure of official document format. And we also select metadata elements reflecting creation background, publisher's intention, characteristic of official documents through evaluating and comparing extracted elements with data elements defined in ISAD rules. It would be draft data in constructing standardized metadata structure for records in Korea.

2

기술문서 정의문 패턴을 이용한 전문용어사전 자동추출 및 활용방안

한희정(전북대학교) ; 김태영(전북대학교) ; 두효철(전북대학교) ; 오효정(전북대학교) 2017, Vol.34, No.4, pp.81-99 https://doi.org/10.3743/KOSIM.2017.34.4.081

초록보기

초록

기술문서는 지식정보사회에서 생성되는 중요 연구 성과물로, 이를 제대로 활용하기 위해서는 정보 요약 및 정보추출과 같은 개선된 정보 처리 방법을 토대로 기술문서 활용의 편의성을 높여줄 필요가 있다. 이에 본 연구는 기술문서의 핵심 정보를 추출하기 위한 방안으로, 기술문서의 구조와 정의문 패턴을 기반으로 전문용어 및 정의문을 자동 추출하고, 이를 기반으로 전문용어사전을 구축할 수 있는 시스템을 제안하였다. 나아가 전문용어사전을 지식메모리로서 보다 다양하게 활용할 수 있도록 전문용어사전에 기반한 개인화서비스 제공방안을 제안하였다. 이처럼 전문용어 및 정의문 자동추출을 기반으로 전문용어사전을 구축하게 되면 새롭게 등장하는 전문용어를 빠르게 수용할 수 있어 이용자들이 최신정보를 보다 손쉽게 찾을 수 있다. 더불어 개인화된 전문용어사전을 이용자에게 제공한다면 전문용어사전의 가치와 활용성, 검색의 효율성을 극대화할 수 있다.

Abstract

Technical documents are important research outputs generated by knowledge and information society. In order to properly use the technical documents properly, it is necessary to utilize advanced information processing techniques, such as summarization and information extraction. In this paper, to extract core information, we automatically extracted the terminologies and their definition based on definitional sentences patterns and the structure of technical documents. Based on this, we proposed the system to build a specialized terminology dictionary. And further we suggested the personalized services so that users can utilize the terminology dictionary in various ways as an knowledge memory. The results of this study will allow users to find up-to-date information faster and easier. In addition, providing a personalized terminology dictionary to users can maximize the value, usability, and retrieval efficiency of the dictionary.

3

OAIS 모형의 PDI(Preservation Description Information)를 기반으로 하는 국가기록 보존기술요소 연구

우학명(국회도서관) ; 김희정(국제백신연구소) 2009, Vol.26, No.4, pp.227-248 https://doi.org/10.3743/KOSIM.2009.26.4.227

초록보기

초록

본 연구에서는 OAIS 참조모형(ISO 14721)의 ‘보존기술정보(Preservation Description Information: PDI)’에서 제시하는 디지털자료의 기술요소유형들과 국가기록원 및 국회기록보존소에서 디지털문서의 보존을 위하여 사용하고 있는 기술요소들을 비교․분석하여 개선방안을 제시함으로써 국제표준에 입각한 국가기록 디지털문서의 장기보존 기반환경을 도모하고자 하였다. 국가기록원의 경우 2009년도 기록물관리지침과 2007년도 기록관리 메타데이터 표준을 대상으로 하였고, 국회기록보존소의 경우에는 2009년도 국회기록보존소 내부문서인 기록관리업무편람과 국회기록관리시스템에서 적용되는 메타데이터 규칙을 대상으로 하였다. 또한 실무전문가들을 중심으로 한 그룹인터뷰를 병행함으로써 확장하여야 할 보존기술요소들을 확인하였다. 확인 결과 현재 국가기록원과 국회기록보존소에서 적용하고 있는 디지털문서 보존기술은 특정 요소에 편중성을 보이고 있었다. 이에 본 연구에서는 OAIS PDI의 개념적용 및 Calanag, Russell 등의 연구에서 제시된 PDI 요소들을 기반으로 중심요소(elements)와 중심요소에 따르는 하위요소(sub-elements)들을 상세하게 정립하였다.

Abstract

In this study, description elements of National Archives of Korea(NAK) and National Assembly Archives(NAA) were collected and analysed based on PDI(Preservation Description Information) of OAIS Reference Model(ISO 14721). As for NAK, records management guideline published in 2009 and metadata standards published in 2007 were analysed. As for NAA, Records management manual published in 2009 and metadata applied in national assembly records management system were analysed with group interviews. As a result, improved metadata details and sub-elements were suggested based on OAIS PDI concepts and Calanag's and Russell's research.

4

PREMIS 데이터모델 적용을 위한 사무문서 컨텐츠모형 설계 연구

문주영(숭의여자대학) ; 김태수(연세대학교) 2011, Vol.28, No.1, pp.43-68 https://doi.org/10.3743/KOSIM.2011.28.1.043

초록보기

초록

본 연구에서는 OAIS 참조 모형을 구체적으로 발전시킨, 사실상의 보존 메타데이터 표준인 PREMIS 데이터모델과 데이터사전을 사무문서에 적용하기 위한 사무문서 컨텐츠모형을 개발하였다. 대상 문서는 ‘A사 B국 해외 석유사업 및 유전개발 문서’로 국가 차원 이상의 영구 보존 가치를 지니는 문서 컬렉션이다. PREMIS 데이터모델을 사무문서에 구체적으로 적용하기 위하여 PREMIS 모델 내의 지적개체에 대한 문서 차원의 개념 정립과 이해를 시도하였다. 즉, 문서 컨텐츠의 계층을 구분하는 원칙과 구조를 설계하였고 그에 맞추어 사무문서 컨텐츠를 대상으로 한 계층 모형을 만들어 사무문서 컨텐츠모형을 도출하였다. 이 과정에서 기록물 기술 규칙을 준수하였다.

Abstract

This study presents a contents model designed for business records that require long-term preservation. The contents model is based on the PREMIS(Preservation Metadata: Implementation Strategies) data model and the ISAD(G)(General International Standard Archival Description). The study selected the record collection of “the records of the overseas petroleum business and oil field development of A company located in B country.” This collection requires permanent preservations by the nation and even beyond. It was attempted to establish the concepts of intellectual objects in the PREMIS data model to apply the PREMIS data model to the business records specifically. In other words, the study established the principles for differentiation of the classes in the record contents and the hierarchy structure, and the hierarchy model was developed for business records contents to derive the business records model based on those principles.

5

문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구

정은경(이화여자대학교) 2007, Vol.24, No.4, pp.305-321 https://doi.org/10.3743/KOSIM.2007.24.4.305

초록보기

초록

이 연구는 색인가가 주제 색인하는 과정에서 참조하는 여러 문서구성요소를 문서 범주화의 정보원으로 인식하여 이들이 문서 범주화 성능에 미치는 영향을 살펴보는데 그 목적이 있다. 이는 기존의 문서 범주화 연구가 전문(full text)에 치중하는 것과는 달리 문서구성요소로서 정보원의 영향을 평가하여 문서 범주화에 효율적으로 사용될 수 있는지를 파악하고자 한다. 전형적인 과학기술 분야의 저널 및 회의록 논문을 데이터 집합으로 하였을 때 정보원은 본문정보 중심과 문서구성요소 중심으로 나뉘어 질 수 있다. 본문정보 중심은 본론 자체와 서론과 결론으로 구성되며, 문서구성요소 중심은 제목, 인용, 출처, 초록, 키워드로 파악된다. 실험 결과를 살펴보면, 인용, 출처, 제목 정보원은 본문 정보원과 비교하여 유의한 차이를 보이지 않으며, 키워드 정보원은 본문 정보원과 비교하여 유의한 차이를 보인다. 이러한 결과는 색인가가 참고하는 문서구성요소로서의 정보원이 문서 범주화에 본문을 대신하여 효율적으로 활용될 수 있음을 보여주고 있다.

Abstract

The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

6

온라인 환경에서의 전자문서 안전배포 및 이용을 위한 인증방법 설계 및 구현

김용(전북대학교) 2008, Vol.25, No.1, pp.75-98 https://doi.org/10.3743/KOSIM.2008.25.1.075

초록보기

초록

개방형 네트워크인 인터넷의 확산과 웹(Web) 기술이 발전함에 따라 다양한 형태의 전자문서가 생산 및 유통되고 있다. 전자문서(e-Document)는 문서를 생산한 기관에서 일련의 행위를 유발하기 위한 정보 또는 내용을 포함하는 기록물의 일종이다. 본 연구에서는 이러한 전자문서의 안전한 이용 및 유통을 위하여 요구되는 보안 알고리즘을 제안하였다. 특히, 제안된 방법은 전자문서의 진본성(Authenticity), 신뢰성(Reliability), 무결성(Integrity)을 보장하기 위한 전자서명의 생성과 이용자의 정당성 확보를 위한 인증과정에 적용할 수 있다. 또한, 보안성과 저장성에 있어서 높은 신뢰도를 가지고 있는 스마트카드를 활용함으로써 기존의 방법에 비하여 높은 보안성을 확보할 수 있었다. 제안된 방법의 효율성 및 신뢰성에 대한 검증을 위하여 실험을 수행하였다.

Abstract

With explosive growth in the area of the Internet and IT services, various types of e-documents are generated and circulated. An e-Document is a sort of electronic records which a organization performs works and goals. In this study, we propose a security algorithm for secure use and distribution of e-documents. Especially, the proposed method can be applied to generate digital signature which can guarantee authenticity, integrity, confidentiality of an e-document and authenticate authorized users. Also, we can get higher security level as using a smart card that provides highly storing capacity and security. We carried out an experiment to verify efficiency and security of the proposed method.

7

혼합 방식에 기반한 의견 문서 검색 시스템

이승욱(고려대학교 정보통신대학원) ; 송영인(고려대학교 정보통신대학원) ; 임해창(고려대학교) 2008, Vol.25, No.4, pp.115-129 https://doi.org/10.3743/KOSIM.2008.25.4.115

초록보기

초록

최근 웹 환경이 대중화되고 개방됨에 따라 웹은 단순한 정보 획득의 공간이 아닌, 의견 표출과 교환의 장이 되어 가고 있으며, 이에 따라 웹 상에서 표출된 특정 주제에 대한 사람들의 의견을 자동으로 검색하기 위한 기술 개발의 필요성이 점차 증대되고 있다. 이러한 의견 문서 검색 문제는 사용자 질의와 문서간의 적합성만을 고려하는 일반적인 정보검색 방법으로는 해결하기 어려우며, 문서 내 의견 포함 여부 분석을 수행할 수 있는 더욱 진보된 시스템을 필요로 한다. 본 논문에서는 기존 검색 시스템의 구조 하에서, 의견 문서 검색을 효과적으로 수행할 수 있는 시스템을 제안한다. 의견 검색을 수행하기 위해 문서 내 의견 분석 방법에 대해 기존의 사전 기반 방식과 기계학습 기반 방식을 결합한 새로운 혼합 방식을 제안하고, 실험을 통하여 검색 성능을 개선하는 효과가 있음을 보였다.

Abstract

Recently, as its growth and popularization, the Web is changed into the place where people express, share and debate their opinions rather than the space of information seeking. Accordingly, the needs for searching opinions expressed in the Web are also increasing. However, it is difficult to meet these needs by using a classical information retrieval system that only concerns the relevance between the user's query and documents. Instead, a more advanced system that captures subjective information through documents is required. The proposed system effectively retrieves opinionated documents by utilizing an existing information retrieval system. This paper proposes a kind of hybrid method which can utilize both a dictionary-based opinion analysis technique and a machine learning based opinion analysis technique. Experimental results show that the proposed method is effective in improving the performance.

8

과학기술 핵심개체 인식기술 통합에 관한 연구

최윤수(한국과학기술정보연구원) ; 정창후(한국과학기술정보연구원) ; 조현양(경기대학교) 2011, Vol.28, No.1, pp.89-104 https://doi.org/10.3743/KOSIM.2011.28.1.089

초록보기

초록

대용량 문서에서 정보를 추출하는 작업은 정보검색 분야뿐 아니라 질의응답과 요약 분야에서 매우 유용하다. 정보추출은 비정형 데이터로부터 정형화된 정보를 자동으로 추출하는 작업으로서 개체명 인식, 전문용어 인식, 대용어 참조해소, 관계 추출 작업 등으로 구성된다. 이들 각각의 기술들은 지금까지 독립적으로 연구되어왔기 때문에, 구조적으로 상이한 입출력 방식을 가지며, 하부모듈인 언어처리 엔진들은 특성에 따라 개발 환경이 매우 다양하여 통합 활용이 어렵다. 과학기술문헌의 경우 개체명과 전문용어가 혼재되어 있는 형태로 구성된 문서가 많으므로, 기존의 연구결과를 이용하여 접근한다면 결과물 통합과정의 불편함과 처리속도에 많은 제약이 따른다. 본 연구에서는 과학기술문헌을 분석하여 개체명과 전문용어를 통합 추출할 수 있는 기반 프레임워크를 개발한다. 이를 위하여, 문장자동분리, 품사태깅, 기저구인식 등과 같은 기반 언어 분석 모듈은 물론 이를 활용한 개체명 인식기, 전문용어 인식기를 개발하고 이들을 하나의 플랫폼으로 통합한 과학기술 핵심개체 인식 체계를 제안한다.

Abstract

Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

9

수식을 포함한 전자문헌의 구조적 처리를 위한 XML 문서편집시스템

윤화묵(한국과학기술정보연구원) ; 정회경(배재대학교) ; 김창수(연세대학교) ; 유범종(한국과학기술정보연구원) 2002, Vol.19, No.4, pp.96-111 https://doi.org/10.3743/KOSIM.2002.19.4.096

초록보기

초록

현재 기관이나 조직 내에 수많은 양의 데이터가 축적되어 존재하고 있으나 대부분의 데이터는 각 기관이나 조직에 따라 정형화된 형태로 남아있는 실정이다. 정형화된 정보는 정보의 교환 및 공유에 어려움이 있다. 이러한 단점을 극복하고자 지식정보자원관리라는 새로운 개념이 도입되었으며, 축적된 데이터들을 공유 및 관리하기 위한 지식정보자원의 디지털화가 실행되고 있다. 특히 과학기술 또는 교육학술 분야에서는 지식정보자원의 교환 및 공유에 필요한 데이터를 구조적으로 처리하고자 XML을 도입하려는 움직임이 일고 있으며, 이들 분야의 전자문서 안에 사용되어지는 수많은 수학식의 표현이 이미지나 텍스트 등의 비구조적인 데이터로 처리됨에 따라 검색과 인덱싱 또는 재사용성 등의 제한사항이 발생하게 된다. 이를 극복하고자 MathML을 이용한 수학식의 처리에 관심이 집중되고, MathML을 구조적인 문서상에 쉽고 효율적으로 처리할 수 있는 솔루션이 요구되고 있는 실정이다. 이에 본 논문에서는 지식정보자원을 목적으로 하는 전자문서의 구조적인 처리를 용이하게 하고, MathML에 대한 전문적인 지식이 없어도 구조적인 문서상에 쉽게 MathML을 생성 및 표현할 수 있는 XML 문서 편집 시스템을 구현하였다.

Abstract

A lot of accumulated data of many quantity exist within a institution or an organization, but most data is remained in form of standardization as each institution or organization. There are difficulty in exchange and share of information. New concept of knowledge information resource management to overcome this disadvantage was introduced, and the digitization of knowledge information resources to share and manage accumulated data is been doing. Specially, in science technic or education scholarship it, the tendency that importing XML to process necessary data to exchange and share of knowledge information resources structurally, and limitation of back for search and indexing or reusability is happened according as expression of great many mathematics used inside electron document of these sphere is processed to nonstructural data of image or text and so on. There is interest converged in processing of mathematics that use MathML to overcome this, and we require the solution to be able to process MathML easily and efficiently on structural document. In this paper, designed and implemented of XML document editing system which easy structural process of electronic document for knowledge information resources, and create and express MathML easily on structural document without expert knowledge about MathML.

10

하이브리드 다중모델 학습기법을 이용한 자동 문서 분류

명순희(용인송담대학) ; 김인철(경기대학교) 2002, Vol.19, No.4, pp.35-51 https://doi.org/10.3743/KOSIM.2002.19.4.035

초록보기

초록

본 논문에서는 다중 모델 기계학습 기법을 이용하여 자동 문서 분류의 성능과 신뢰도를 향상시킬 수 있는 연구와 실험 결과를 기술하였다. 기존의 다중 모델 기계 학습법들이 훈련 데이터 또는 학습 알고리즘의 편향에 의한 오류를 극복하고자 한 것인데 비해 본 논문에서 제안한 메타 학습을 이용한 하이브리드 다중 모델 방식은 이 두 가지의 오류 원인을 동시에 해소하고자 하였다. 다양한 문서 집합에 대한 실험 결과. 본 논문에서 제안한 하이브리드 다중 모델 학습법이 전반적으로 기존의 일반 다중모델 학습법들에 비해 높은 성능을 보였으며, 다중 모델의 결합 방식으로서 메타 학습이 투표 방식에 비해 효율적인 것으로 나타났다.

Abstract

Inductive learning and classification techniques have been employed in various research and applications that organize textual data to solve the problem of information access. In this study, we develop hybrid model combination methods which incorporate the concepts and techniques for multiple modeling algorithms to improve the accuracy of text classification, and conduct experiments to evaluate the performances of proposed schemes. Boosted stacking, one of the extended stacking schemes proposed in this study yields higher accuracy relative to the conventional model combination methods and single classifiers.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지