정보관리학회지, 한국정보관리학회

1

정은경(이화여자대학교) 2009, Vol.26, No.3, pp.261-278 https://doi.org/10.3743/KOSIM.2009.26.3.261

초록보기

초록

기계학습 기반 문서범주화 기법에 있어서 최적의 자질을 구성하는 것이 성능향상에 있어서 중요하다. 본 연구는 학술지 수록 논문의 필수적 구성요소인 저자 제공 키워드와 논문제목을 대상으로 자질확장에 관한 실험을 수행하였다. 자질확장은 기본적으로 선정된 자질에 기반하여 WordNet과 같은 의미기반 사전 도구를 활용하는 것이 일반적이다. 본 연구는 키워드와 논문제목을 대상으로 WordNet 동의어 관계 용어를 활용하여 자질확장을 수행하였으며, 실험 결과 문서범주화 성능이 자질확장을 적용하지 않은 결과와 비교하여 월등히 향상됨을 보여주었다. 이러한 성능향상에 긍정적인 영향을 미치는 요소로 파악된 것은 정제된 자질 기반 및 분류어 기준의 동의어 자질확장이다. 이때 용어의 중의성 해소 적용과 비적용 모두 성능향상에 영향을 미친 것으로 파악되었다. 본 연구의 결과로 키워드와 논문제목을 활용한 분류어 기준 동의어 자질 확장은 문서 범주화 성능향상에 긍정적인 요소라는 것을 제시하였다.

Abstract

Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

2

자동요약시스템 구축에 대한 연구 - 웹 상의 보도기사를 중심으로 -

이태영(전북대학교) 2006, Vol.23, No.4, pp.41-67 https://doi.org/10.3743/KOSIM.2006.23.4.041

초록보기

초록

웹의 보도기사에 관한 자동요약시스템을 구축하기 위하여 담화구조와 지식기반 기법을 적용한 글구조 프레임과 제 규칙들을 작성하였다. 프레임에는 문단과 문장 및 절의 역할, 문단과 문장의 성질, 역할을 구분하는 판별규칙, 주요문장 발췌규칙, 그리고 요약문작성규칙 슬롯이 포함되었다. 문맥정의, 고유명사 등을 안내하는 ‘if-needed'와 변화된 슬롯 값을 알려주는 if-changed 패싯도 구비되었다. 슬롯이나 패싯의 실제 값들을 추출 표현하는 과정에서 문구의 수사적 역할과 단어 최상위 범주 및 줄거리 단위를 참조하였다. 의미흐름의 연결성을 유지하면서 요약 문장들을 통합, 분리, 합성하는 재구성은 유사도공식, 구문정보, 담화구조와 지식기반 방법에서 도출한 제 규칙 및 문맥정의를 이용하였고 비평과 같은 새로운 문장을 생성하였다.

Abstract

The writings frame and various rules based on discourse structure and knowledge-based methods were applied to construct the automatic Ext/Sums (extracts & summaries) system from the straight news in web. The frame contains the slot and facet represented by the role of paragraphs, sentences, and clauses in news and the rules determining the type of slot. Rearrangement like Unification, separation, and synthesis of the candidate sentences to summary, maintaining the coherence of meanings, were also used the rules derived from similar degree measurement, syntactic information, discourse structure, and knowledge-based methods and the context plots defined with the syntactic/semantic signature of noun and verb and category of verb suffix. The critic sentence were tried to insert into summary

3

복수의 신문기사 자동요약에 관한 실험적 연구

김용광(연세대학교) ; 정영미(연세대학교) 2006, Vol.23, No.1, pp.83-98 https://doi.org/10.3743/KOSIM.2006.23.1.083

초록보기

초록

이 연구에서는 복수의 신문기사를 자동으로 요약하기 위해 문장의 의미범주를 활용한 템플리트 기반 요약 기법을 제시하였다. 먼저 학습과정에서 사건/사고 관련 신문기사의 요약문에 포함할 핵심 정보의 의미범주를 식별한 다음 템플리트를 구성하는 각 슬롯의 단서어를 선정한다. 자동요약 과정에서는 입력되는 복수의 뉴스기사들을 사건/사고 별로 범주화한 후 각 기사로부터 주요 문장을 추출하여 템플리트의 각 슬롯을 채운다. 마지막으로 문장을 단문으로 분리하여 템플리트의 내용을 수정한 후 이로부터 요약문을 작성한다. 자동 생성된 요약문을 평가한 결과 요약 정확률과 요약 재현율은 각각 0.541과 0.581로 나타났고, 요약문장 중복률은 0.116으로 나타났다.

Abstract

This study proposes a template-based method of automatic summarization of multiple news articles using the semantic categories of sentences. First, the semantic categories for core information to be included in a summary are identified from training set of documents and their summaries. Then, cue words for each slot of the template are selected for later classification of news sentences into relevant slots. When a news article is input, its event/accident category is identified, and key sentences are extracted from the news article and filled in the relevant slots. The template filled with simple sentences rather than original long sentences is used to generate a summary for an event/accident. In the user evaluation of the generated summaries, the results showed the 54.1% recall ratio and the 58.1% precision ratio in essential information extraction and 11.6% redundancy ratio.

4

검색엔진의 정확률 향상을 위한 질의어 의미와 사용자 반응 정보의 이용

윤성희(상명대학교) 2009, Vol.26, No.4, pp.81-92 https://doi.org/10.3743/KOSIM.2009.26.4.081

초록보기

초록

본 논문은 정보검색 시스템의 사용자 질의어와 색인에 기반한 검색 과정에서 나타나는 중의성 해소를 위해 질의어 의미정보와 사용자 피드백을 사용하여 검색 성능을 향상시키는 방법을 소개한다. 의미 정보를 이용하여 질의어의 중의성을 해소하는 검색 과정은 검색 결과로서 의미적으로 무관한 많은 문서들을 배제할 수 있다. 이를 위해 검색의 색인이 되는 명사 중심의 의미범주를 기반으로 의미정보 지식베이스를 구축하고, 검색 문서들을 색인어와 해당 의미범주로 분류한다. 검색 과정에서는 사용자의 질의 의미 선택과 정답 문서에 대한 참조 행위를 웹 페이지의 순위 결정에 반영하여 검색 성능을 향상시킬 수 있다.

Abstract

This paper proposes a technique for improving performance using word senses and user feedback in web information retrieval, compared with the retrieval based on ambiguous user query and index. Disambiguation using query word senses can eliminating the irrelevant pages from the search result. According to semantic categories of nouns which are used as index for retrieval, we build the word sense knowledge-base and categorize the web pages. It can improve the precision of retrieval system with user feedback deciding the query sense and information seeking behavior to pages.

5

우리나라 공공기관 행정감시자의 정보추구에 관한 질적 연구

임진희(한국국가기록연구원) ; 이준기(연세대학교) 2009, Vol.26, No.4, pp.249-276 https://doi.org/10.3743/KOSIM.2009.26.4.249

초록보기

초록

이 연구는 공공기관을 대상으로 한 행정감시자들이 국정감사와 정보공개 청구 등의 설명책임 메커니즘을 통해 공공정보를 추구하는 절차와 그 과정에 영향을 주는 요인들을 밝히는 것을 목적으로 한다. 공공기관에 대해 직업적으로 행정감시 활동을 수행하는 국회의원 보좌관, 시민사회단체 활동가, 언론 기획탐사보도팀의 전문리서처 등의 정보추구자들을 연구대상으로 하여 심층 인터뷰와 참여관찰을 통해 자료를 수집하였고, 근거이론에 따라 자료를 분석하여 행정감시자들의 공공정보 추구절차를 정의하였으며, 추구과정에 연관된 56개의 개념, 17개의 범주, 6개의 상위범주를 도출한 후 각 범주들이 추구과정에 미치는 영향의 범위에 따라 배경요인, 상황요인, 절차 상의 요인으로 구분하였다. 이 연구는 그동안 다뤄지지 않았던 행정감시자의 정보행동모형을 구성했다는 점에서 이론적 공헌이 있으며, 행정감시자의 공공정보 추구과정에서 나타나는 독특한 양상을 질적 자료를 통해 의미있는 개념을 범주화했다는 점에서 방법론적 의의가 있으며, 행정감시자 집단에게 공공정보 추구 전략에 필요한 정보를 제공하고 공공기관에게 행정감시자들의 정보요청에 대응하기 위한 전략 요건의 기초를 제시하고 있다는 점에서 실무적 공헌이 있다.

Abstract

The purpose of this study is to investigate the information-seeking procedure of surveillants against public sector organizations in Korea. The surveillants used accountability mechanisms such as National Assembly Inspection and information disclosure to find out information they wanted. Examples of such group include social activists, professional supervisors, aides of the National Assembly congressman and the press members. Using data collected by in-depth interviews and participative observations, we studied their information seeking behaviors and factors that affect the procedure. Based on the Grounded Theory approach, we first generated 56 concepts, 17 categories and 6 super-categories about the participants' feeling, experiences and perception related to their information seeking. Then we developed a factor model among those generated concepts. The main contributions of this study are a) the results provide a useful guidance for the public information seekers b) we draw the requirements for enhancing public sector organizations' information management systems

6

온톨로지 품질평가를 위한 평가항목 추출에 관한 연구

김성훈(성균관대학교) ; 오삼균(성균관대학교) 2015, Vol.32, No.2, pp.193-219 https://doi.org/10.3743/KOSIM.2015.32.2.193

초록보기

초록

온톨로지의 평가는 잘 구축된 기존 온톨로지와 비교하는 방법, 활용될 애플리케이션에 적용해보는 방법, 원천데이터와의 적합성․관련성을 판단해보는 방법을 통해 이뤄지고 있다. 이와 같은 방법론은 온톨로지를 통해 얻게 된 결과에 치중되어 온톨로지의 체계, 의미표현, 상호운용성과 같은 내재적인 영역의 평가에 어려움이 있다. 본 연구는 온톨로지 전문가를 통해 온톨로지 품질평가를 위한 항목을 도출하였다. 문헌조사를 통해 온톨로지의 내재적 평가를 위한 범주를 추출하였고, 각 범주에 대한 평가항목을 델파이조사를 통해 전문가들에게 수집한 뒤, 수집된 평가항목을 재검증하였다. 그 결과, 처음 수집된 70개의 평가 항목에서 최종적으로 53개의 평가항목을 선정하였다. 또한 수집된 평가항목을 온톨로지 평가에 활용하여 봄으로써 평가항목의 신뢰도를 측정하였다.

Abstract

The focus of traditional evaluations of ontologies is largely performance-based. A comparison of a new ontology with well-established ones, testing of ontologies in different applications, as well as any judgment of an ontology’s appropriateness and relatedness to source data heavily rely on what results that ontology seems to manifest. This study, on the other hand, is an attempt to evaluate the quality of a particular ontology as manifested by its structure, representation, and interoperability. To that end, major categories of quality evaluations were first identified through an extensive survey of literature. Evaluation questions were formulated from these categories using the Delphi method and were validated by ontology experts. The entire process produced a set of 53 evaluation questions, which was then employed to test the quality of a newly-developed smartphone ontology.

7

주제별 분산 지식베이스에 의한 개념기반 정보검색시스템의 성능향상에 관한 연구

노영희(이화여자대학교) 2002, Vol.19, No.1, pp.47-69 https://doi.org/10.3743/KOSIM.2002.19.1.047

초록보기

초록

개념기반 정보검색기법은 불리언 검색기법의 문제점을 해소했다고 평가받고 있는 단순 매칭함수 기법이나 P-norm 검색기법보다 높은 성능을 보여주고 있다. 그러나 개념화장에 필수적인 의미망 지식베이스를 구축하는데 시간이 너무 오래 걸리는 단점이 있다. 본 연구에서는 이러한 문제를 해결하기 위해 주제범주별로 지식베이스를 분산 구축함으로써 지식베이스 구축에 소요되는 시간을 단축하면서도 검색성능이 떨어지지 않도록 하는 방안을 모색하고자 하였다.

Abstract

The concept based retrieval model has shown a higher performance than those of the simple matching function method or the P-norm retrieval method introduced to compensate the demerits of the Boolean retrieval model. However. it takes too long to create a semantic-net knowledge base, which is essential in concept exploration. In order to solve such demerits. a method was sought out by creating a distributed knowledge base by subjects to reduce construction time without hindering the performance of retrieval.

8

용어의 문맥활용을 통한 문헌 자동 분류의 성능 향상에 관한 연구

송성전(연세대학교) ; 정영미(연세대학교) 2012, Vol.29, No.2, pp.205-224 https://doi.org/10.3743/KOSIM.2012.29.2.205

초록보기

초록

자동 분류에서 문헌을 표현하는 일반적인 방식인 BOW는 용어를 독립적으로 처리하기 때문에 주변 문맥을 반영하지 못한다는 한계가 있다. 이에 본 연구는 각 용어마다 주제범주별 문맥적 특징을 파악해 프로파일로 정의하고, 이 프로파일과 실제 문헌에서의 문맥을 비교하는 과정을 통해 동일한 형태의 용어라도 그 의미나 주제적 배경에 따라 구분하고자 하였다. 이를 통해 주제가 서로 다름에도 불구하고 특정 용어의 출현만으로 잘못된 분류 판정을 하는 문제를 극복하고자 하였다. 본 연구에서는 이러한 문맥적 요소를 용어 가중치, 분류기 결합, 자질선정의 3가지 항목에 적용해 보고 그 분류 성능을 측정했다. 그 결과, 세 경우 모두 베이스라인보다 분류 성능이 향상되었고 가장 큰 성능 향상을 보인 것은 분류기 결합이었다. 또한 제안한 방법은 학습문헌 수가 많고 적음에 따라 발생하는 성능의 편향을 완화하는데도 효과적인 것으로 나타났다.

Abstract

One of the limitations of BOW method is that each term is recognized only by its form, failing to represent the term’s meaning or thematic background. To overcome the limitation, different profiles for each term were defined by thematic categories depending on contextual characteristics. In this study, a specific term was used as a classification feature based on its meaning or thematic background through the process of comparing the context in those profiles with the occurrences in an actual document. The experiment was conducted in three phases; term weighting, ensemble classifier implementation, and feature selection. The classification performance was enhanced in all the phases with the ensemble classifier showing the highest performance score. Also, the outcome showed that the proposed method was effective in reducing the performance bias caused by the total number of learning documents.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지