정보관리학회지, 한국정보관리학회

11

로치오 알고리즘을 이용한 학술지 논문의 디스크 립터 자동부여에 관한 연구

김판준(신라대학교) 2006, Vol.23, No.3, pp.69-89 https://doi.org/10.3743/KOSIM.2006.23.3.069

초록보기

초록

로치오 알고리즘에 기초한 통제어휘 자동색인 또는 텍스트 범주화에서 적용되어 온 여러 성능 요인들을 재검토하였고, 성능 향상을 위한 기본적인 방법을 찾아보았다. 또한, 동등한 조건에서 통제어휘 자동색인을 위한 로치오 알고리즘 기반 방법의 성능을 다른 학습기반 방법들의 성능과 비교하였다. 결과에 따르면, 통제어휘 자동색인을 위한 로치오 기반의 프로파일 방법은 구현의 용이성과 컴퓨터 처리시간 측면의 경제성이라는 기존의 장점을 그대로 유지하면서도, 다른 학습기반 방법들(SVM, VPT, NB)과 거의 동등하거나 더 나은 성능을 보여주었다. 특히, 색인전문가의 색인작업을 지원하는 반-자동 색인의 목적으로는 비교적 높은 수준의 재현율을 유지하면서 학습 데이터의 증가에 따라 정확률이 크게 향상되는 로치오 알고리즘을 이용한 방법을 우선적으로 고려할 수 있을 것이다.

Abstract

Several performance factors which have applied to the automatic indexing with controlled vocabulary and text categorization based on Rocchio algorithm were examined, and the simple method for performance improvement of them were tried. Also, results of the methods using Rocchio algorithm were compared with those of other learning based methods on the same conditions. As a result, keeping with the strong points which are implementational easiness and computational efficiency, the methods based Rocchio algorithms showed equivalent or better results than other learning based methods(SVM, VPT, NB). Especially, for the semi-automatic indexing(computer-aided indexing), the methods using Rocchio algorithm with a high recall level could be used preferentially.

12

대단위 우리말 온톨리지 구축을 위한 시소러스의 개발

최석두(한성대학교) ; 이우범(한성대학교) ; 김이겸(광주대학교) ; 이정연(한국학술진흥재단 지식정보센터) ; 최상기(전북대학교) ; 한상길(대림대학교) 2006, Vol.23, No.4, pp.147-164 https://doi.org/10.3743/KOSIM.2006.23.4.147

초록보기

초록

Abstract

This paper reports an effort to construct a grand-scale Korean thesaurus that can be used for enhancing retrieval performance in various fields. This thesaurus is currently being used for indexing and retrieving purpose and new terms are being added to it. As the new demands on retrieval performance increase in Korea, developing a grand-scale ontology appears to be necessary so a project is undertaken to transfer the current thesaurus into an ontology system. The paper describes how the thesaurus is constructed and prepared to be the base for an ontology system.

13

소설 주제 접근체계의 확장 연구 - 상징과 모티프를 중심으로 -

김나름(연세대학교) ; 김태수(연세대학교) 2006, Vol.23, No.4, pp.69-87 https://doi.org/10.3743/KOSIM.2006.23.4.069

초록보기

초록

소설을 비롯한 문학작품에 대한 접근은 기술요소 중심이었고, 주제접근 역시 작품 속에 등장하는 소재, 인물명, 지명 등 형식 요소에 국한되어 왔다. 이러한 관행은 소설 주제의 본질을 놓친 것이며 미학적 경험을 추구하는 이용자의 주제요구를 반영하지 못한다. 이 연구에서는 소설 주제접근체계의 확장을 위해 상징 및 모티프의 개념과 주제접근점으로서의 가능성을 검토하였다. 이와 함께 해당 용어사전을 정보원으로 활용하여 상징과 모티프 체계를 구성하고, 20세기 한국소설에 적용해 이용성과 한계점을 논하였다.

Abstract

The access to literary works, including fictions, has focused on descriptive elements, and the subject access has been confined to denotative elements such as the subject matter, name of character and geographical name, etc, which appear in the work. This practice will not lead to the essence of subject of fiction, and does not reflect the demand of users for the subject who pursue aesthetic experience. In this study, concepts of symbol and motif and their possibility to be used as subject access point are considered to enhance a subject access scheme. In addition, this study tries to build the scheme of symbol and motif by using the glossary as the source of information. The composed schemes are applied to 20th century Korean fictions and its usability and limits are discussed.

14

대학 기관 리포지토리의 운영 현황 분석 및 개선 방안에 관한 연구 - dCollection을 중심으로 -

김현희(명지대학교) ; 정경희(한성대학교) ; 김용호(부경대학교) 2006, Vol.23, No.4, pp.17-39 https://doi.org/10.3743/KOSIM.2006.23.4.017

초록보기

초록

기관 리포지토리는 오픈 액세스 운동을 실현할 수 있는 핵심적인 체제의 하나로 알려져 있다. 한국교육학술정보원은 학술 정보 공유 공간으로 대학 기관 리포지토리 컨소시엄인 dCollection을 2003년에 구성하여 현재 62개의 국사립 대학들이 회원 대학으로 참가하고 있다. 본 연구의 목적은 2005년도에 구축된 dCollection 평가 모형을 조사 도구로 활용하여, 40개의 대학 기관 리포지토리의 운영 현황을 파악하고, 이러한 조사 결과를 기초로 하여 dCollection 자료의 등록률 및 이용율 향상에 초점을 맞춰 국내 기관 리포지토리의 발전 방안을 제안하여 효율적인 국가지식정보 유통체제의 인프라 구축을 목적으로 한다.

Abstract

Building institutional repositories is known as one of powerful methods for realizing the open access movement. The Korean Education and Research Information Service(KERIS) proposed to organize institutional repositories into a consortium, called "dCollection (Digital Collection)," composed of 62 universities since 2003. The purpose of this study is to investigate the current state of 40 member universities of dCollection using the evaluation model including 4 categories and 39 indicators, and, based on the survey outcomes, to pinpoint the procedural or performance weak points of the dCollection systems in order to find its customized solutions focusing on the improvement of use and self-archiving rates.

15

디렉토리 서비스 중개 게이트웨이 모형 구축 -주요 검색포털의 뉴스, 미디어 분야를 중심으로-

김성원(공군사관학교) ; 김태수(연세대학교) 2006, Vol.23, No.1, pp.99-119 https://doi.org/10.3743/KOSIM.2006.23.1.099

초록보기

초록

인터넷 정보검색과정에서 가장 보편적으로 사용되고 있는 검색방법은 키워드 검색이다. 키워드 검색은 정확률과 재현율의 관점에서 여러가지 단점을 지니고 있다. 이러한 키워드 검색의 단점을 보완해 줄 수 있는 장치로서 다수의 웹 포털에서 디렉토리 검색서비스를 제공하고 있다. 검색포털에서 제공하고 있는 디렉토리 서비스는 포털별로 상이한 분류체계를 사용하는 이유로 이용자에게 불편을 주고 있으며, 이러한 불편의 해소를 위해 디렉토리 서비스간 통합검색을 제공하는 중개 게이트웨이의 구축필요성이 제기되고 있다. 이에 따라 이 연구에서는 네이버, 야후, 엠파스 등 국내 주요 포털의 디렉토리 서비스를 대상으로 통합검색을 제공하는 중개 게이트웨이의 모형을 구축하고 그 성능을 평가하였다.

Abstract

The most widely used information searching method in the current internet environment is the keyword-based one, which has certain limitations in terms of precision and recall. Most major internet portals provide directory-based searching as a means to complement these limitations. However, that they adopt different classification schemes brings significant inconvenience to the users, and it consequently suggests a need to develop mapping gateway to provide cross-portal, or cross-directory information searching. In this context, this study attempts to develop a prototype system of intermediary gateway for integrated search, using the directory services of three major portals, Naver, Yahoo and Empas, and test its performance.

16

국내단행본 원문정보서비스의 경제적 가치 측정에 관한 연구

류희경(국립중앙도서관) ; 이두영(중앙대학교) 2006, Vol.23, No.4, pp.111-128 https://doi.org/10.3743/KOSIM.2006.23.4.111

초록보기

초록

이 연구의 목적은 도서관이 데이터베이스 구축에 많은 비용을 투자할 가치가 있는지를 결정하기 위하여 원문정보서비스의 경제적 가치를 측정하는 것이다. 경제적 가치 측정을 위하여 조건부가치측정법을 적용하였다. 비시장재인 국내단행본 원문정보서비스의 가치를 측정하기 위해 가상시나리오를 설계하고 설문의 신뢰도를 높이기 위해 사전조사와 전문가의 검토, 질문방법으로 이중양분선택형을 선택하였다. 연구 결과, 국내단행본 원문정보서비스에 대해 이용자 1인당 지불하고자 하는 사용가치는 1책당 836원, 비사용가치는 연간 236원으로 측정되었다. 대학생 전체의 연간 경제적 총 가치는 831.8억원으로 산출되었다.

Abstract

The purpose of this study is to measure economic value of full-text information services in order to determine whether it is worth for libraries to invest a large amount of money in constructing database to begin with. The study applied an contingent valuation method to measure its economic value. The imaginary scenarios are designed for estimation the value of Non-market-goods, estimation in advance and experts investigation are needed for rising the confidence level, double-bounded dichotomous choice is chosen in question method. The use value, which one user is willing to pay for domestic monograph full-text information services, was 836 won per one monograph. And, the annual non-use value was 236 won. The total annual economic value of all the students was 831 billion won.

17

대학 웹사이트의 정보구조 및 레이블링 시스템 분석

이승민(Indiana University) ; 남태우(중앙대학교) ; 김성희(중앙대학교) 2006, Vol.23, No.2, pp.39-59 https://doi.org/10.3743/KOSIM.2006.23.2.039

초록보기

초록

본 연구에서는 효율적인 정보접근 도구로서의 대학 웹사이트 설계를 위한 정보구조 및 카테고리 레이블을 마련하기 위해 현재 미국 문헌정보학과 웹사이트 17개를 메인메뉴구조, 하부 카테고리, 레이블링을 기준으로 분석하였다. 분석결과 메인메뉴구조는 현재 17개 조사대상 웹사이트에서 모두 공통으로 제공하고 있는 9개 카테고리로 구성하는 것이 바람직한 것으로 나타났으며 둘째, 그 다음 수준의 서브 카테고리는 9개의 카테고리의 내용의 의미를 고려해서 35개 카테고리로 나누는 것이 바람직한 것으로 나타났다. 마지막으로 카테고리 레이블로 사용되는 용어는 17개 웹사이트에서 가장 많이 사용하고 있는 용어를 사용하는 것이 바람직한 것으로 나타났다.

Abstract

In this study we proposed a new informational structure and category labels to fully support the functions of school websites as an access tool to its contents. The proposed model was divided into three main aspects. First, main menu structure was the primary guideline to access information embedded in a website. Therefore, The proposed main menu structure consisted of 9 categories that are commonly provided by 17 existing school websites. Second, first-level categories consisted of total 35 categories under 9 main menu categories. Each category was placed under certain categories in main menu based on the relationships with the meaning of the upper level categories. Third, the proposed model adopted general and comprehensive terms as category labels. The terms used as category labels were based on the analysis of existing category labels, and the most frequently used terms were selected from the current school websites.

18

질의응답문서 검색에서 문서구조를 이용한 질의재생성에 관한 연구

최상희(대구가톨릭대학교) ; 서은경(한성대학교) 2006, Vol.23, No.2, pp.229-243 https://doi.org/10.3743/KOSIM.2006.23.2.229

초록보기

초록

질의응답문서는 이용자가 입력한 질의, 질의설명, 답을 아는 다른 이용자가 제시한 응답으로 구성된 구조화된 문서로서, 최근 웹 문서처럼 검색이 일반적으로 일어나고 있는 정보원이다. 이 연구에서는 질의응답문서의 구조적 특성을 기반으로 질의를 재생성하여 질의응답문서의 검색효율을 향상시키고자 하였다. 질의재생성 실험에서 성능이 비교된 문서구조는 질의와 응답내용이다. 질의를 기반으로 질의를 재생성하는 방식에서는 질의응답검색 시스템에 입력되어 있는 유사질의를 활용하여 클러스터링하는 기법이 적용되었다. 응답정보를 기반으로 질의를 재생성하는 방식에서는 가장 유사한 기존 질의에 대해 응답된 내용에서 단락검색으로 적합한 문장들을 선정하여 활용하는 기법이 적용되었다. 실험 결과 응답정보를 활용하여 질의를 재생성하는 방식이 정확률은 유지하면서 더 다양한 검색결과를 제공하는 것으로 나타났다.

Abstract

This study aims to suggest an effective way to enhance question-answer(QA) document retrieval performance by reconstructing queries based on the structural features in the QA documents. QA documents are a structured document which consists of three components: question from a questioner, short description on the question, answers chosen by the questioner. The study proposes the methods to reconstruct a new query using by two major structural parts, question and answer, and examines which component of a QA document could contribute to improve query performance. The major finding in this study is that to use answer document set is the most effective for reconstructing a new query. That is, queries reconstructed based on terms appeared on the answer document set provide the most relevant search results with reducing redundancy of retrieved documents.

19

복수의 신문기사 자동요약에 관한 실험적 연구

김용광(연세대학교) ; 정영미(연세대학교) 2006, Vol.23, No.1, pp.83-98 https://doi.org/10.3743/KOSIM.2006.23.1.083

초록보기

초록

이 연구에서는 복수의 신문기사를 자동으로 요약하기 위해 문장의 의미범주를 활용한 템플리트 기반 요약 기법을 제시하였다. 먼저 학습과정에서 사건/사고 관련 신문기사의 요약문에 포함할 핵심 정보의 의미범주를 식별한 다음 템플리트를 구성하는 각 슬롯의 단서어를 선정한다. 자동요약 과정에서는 입력되는 복수의 뉴스기사들을 사건/사고 별로 범주화한 후 각 기사로부터 주요 문장을 추출하여 템플리트의 각 슬롯을 채운다. 마지막으로 문장을 단문으로 분리하여 템플리트의 내용을 수정한 후 이로부터 요약문을 작성한다. 자동 생성된 요약문을 평가한 결과 요약 정확률과 요약 재현율은 각각 0.541과 0.581로 나타났고, 요약문장 중복률은 0.116으로 나타났다.

Abstract

This study proposes a template-based method of automatic summarization of multiple news articles using the semantic categories of sentences. First, the semantic categories for core information to be included in a summary are identified from training set of documents and their summaries. Then, cue words for each slot of the template are selected for later classification of news sentences into relevant slots. When a news article is input, its event/accident category is identified, and key sentences are extracted from the news article and filled in the relevant slots. The template filled with simple sentences rather than original long sentences is used to generate a summary for an event/accident. In the user evaluation of the generated summaries, the results showed the 54.1% recall ratio and the 58.1% precision ratio in essential information extraction and 11.6% redundancy ratio.

20

질의로그 데이터에 기반한 특허 및 상표검색에 관한 연구

이지연(연세대학교) ; 백우진(건국대학교) 2006, Vol.23, No.2, pp.61-79 https://doi.org/10.3743/KOSIM.2006.23.2.061

초록보기

초록

본 연구는 특허 및 상표 검색 개선을 위한 방법을 제안하고자 하는 목적에서 출발하였다. 이를 위해 193일간 한국특허정보원의 특허기술정보서비스를 이용한 17,559명의 이용자가 작성한 100,016개의 질의문에 대한 로그 데이터를 분석하였다. 개별적인 질의로그 분석 이외에, 2,202개의 복수 질의문을 이용한 탐색세션을 분석함으로써 검색 개선을 위한 추가적인 단서를 발견하였다. 분석결과에 의하면, 특허 및 상표검색은 일반적인 웹 검색의 유형과 유사한데, 특히 질의문의 길이가 짧다는 측면에서 매우 흡사하다. 그러나 특허 및 상표검색의 경우, 일반 웹 검색보다 불리언 연산자를 많이 사용하고 있었다. 복수 질의문 분석을 통해 이용자들이 질의문을 재작성하는데 도움이 될 수 있는 탐색기능을 제안할 수 있었다. 복수의 질의문으로 구성된 탐색세션을 분석한 결과, 이용자들은 질의문을 재작성하기 위하여 부연하기, 특정화하기, 일반화하기, 교체하기, 중단하기와 같은 방법을 사용하고 있음을 알 수 있었다.

Abstract

To come up with the recommendations to improve the patent & trademark retrieval efficiency, 100,016 patent & trademark search requests by 17,559 unique users over a period of 193 days were analyzed. By analyzing 2,202 multi-query sessions, where one user issuing two or more queries consecutively, we discovered a number of retrieval efficiency improvements clues. The session analysis result also led to suggestions for new system features to help users reformulating queries. The patent & trademark retrieval users were found to be similar to the typical web users in certain aspects especially in issuing short queries. However, we also found that the patent & trademark retrieval users used Boolean operators more than the typical web search users. By analyzing the multi-query sessions, we found that the users had five intentions in reformulating queries such as paraphrasing, specialization, generalization, alternation, and interruption, which were also used by the web search engine users.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지