정보관리학회지, 한국정보관리학회

1

이재윤(경기대학교) 2003, Vol.20, No.4, pp.233-248 https://doi.org/10.3743/KOSIM.2003.20.4.233

초록보기

초록

역문헌빈도 가중치 기법은 문헌 집단에서 출현빈도가 낮을수록 색인어의 중요도가 높다는 가정에 근거하고 있다. 그런데 이는 중간빈도어를 중요하게 여기는 여타 이론과는 일치하지 않는 것이다. 이 연구에서는 저빈도어보다 중간빈도어가 더 중요하다는 가정에 근거하여 역문헌빈도 가중치 공식을 수정한 피벗 역문헌빈도 가중치 기법을 제안하였다. 제안된 기법을 검증하기 위해서 세 실험집단을 대상으로 검색실험을 수행한 결과. 피벗 역문헌빈도 가중치기법이 역문헌빈도 가중치 기법에 비해서 특히 검색결과 상위에서의 성능을 향상시키는 것으로 나타났다.

Abstract

The Inverse Document Frequency (IDF) weighting method is based on the hypothesis that in the document collection the lower the frequency of a term is, the more important the term is as a subject word. This well-known hypothesis is, however, somewhat questionable because some low frequency terms turn out to be insufficient subject words. This study suggests the pivoted IDF weighting method for better retrieval effectiveness, on the assumption that medium frequency terms are more important than low frequency terms. We thoroughly evaluated this method on three test collections and it showed performance improvements especially at high ranks.

2

단과대학별 도서관 장서 활용 현황 분석을 위한 대출데이터 기반 대출지수 비교

최상희(대구가톨릭대학교) ; 이재윤(명지대학교) 2018, Vol.35, No.4, pp.125-140 https://doi.org/10.3743/KOSIM.2018.35.4.125

초록보기

초록

대출데이터는 대학도서관에 축적된 중요한 데이터로서 도서관 장서개발이나 서비스 개선에 활용될 수 있는 중요한 데이터이다. 이 연구는 대출빈도를 기반으로 한 다양한 대출관련지수를 비교분석하여 지수별 특성을 파악한 후 도서관 운영에 적용할 수 있는 타당성을 평가하고자 하였다. A 대학도서관의 10개 단과대학별 대출데이터를 대상으로 비교분석한 지수는 대출빈도, 대출엔트로피, 대출 h-지수, 대출주제차별지수 등 총 4개의 지수이다. 이 지수들을 적용하여 단과대학별 대출현황을 분석하였고 단과대학별로 나타나는 대출주제의 특성을 표하는 각 지수의 특성을 비교 분석하였다. 분석 결과 대출 엔트로피는 여러 대학이 공통으로 선호하는 주제를 표현하는 성향이 있는 것으로 나타났다. 반면 대출주제차별지수는 특정대학에서만 특화되어 대출되는 주제를 표현하는 성향이 있는 것으로 나타났다.

Abstract

Circulation data is a key data set of academic libraries in terms of collection development and service improvement This study aims to identify the characteristics of circulation measures and their feasibility. This study collected the circulation data of 10 colleges in a university and analyzed 4 measures based on the circulation data: circulation frequency, circulation entropy, circulation h-index, and circulation divergence. These measures are to present the circulation topics of each college. This study identified that circulation entropy tends to present general topics which are popular for many colleges, but circulation divergence tends to present specific topics which are preferred by a specific college.

3

연관성 척도의 빈도수준 선호경향에 대한 연구

이재윤(경기대학교) 2004, Vol.21, No.4, pp.281-294 https://doi.org/10.3743/KOSIM.2004.21.4.281

초록보기

초록

연관성 척도는 정보검색 및 데이터마이닝을 비롯한 다양한 분야에서 사용되고 있다. 각 연관성 척도가 높거나 낮은 빈도 중에서 어떤 쪽을 선호하는가를 나타내는 빈도수준 선호경향은 척도의 적용 결과에 중요한 영향을 미치므로 이에 대한 면밀한 조사가 필요하다. 이 연구에서는 주요 연관성 척도들의 빈도수준 선호경향을 가상의 데이터를 통해 분석하고 그 결과를 제시하였다. 또한 코사인 계수를 비롯한 대표적인 연관성 척도에 대해서 빈도수준 선호경향을 조절할 수 있는 방법을 제안하였다. 이 조절 방법을 동시출현 기반 질의확장 정보검색에 적용해본 결과 그 유용성이 확인되었다. 마지막으로 분석 및 실험 결과가 관련 분야에 시사하는 바를 논하였다.

Abstract

Association measures are applied to various applications, including information retrieval and data mining. Each association measure is subject to a close examination to its tendency to prefer high or low frequency level because it has a significant impact on the performance of applications. This paper examines the frequency level preference(FLP) tendency of some popular association measures using artificially generated cooccurrence data, and evaluates the results. After that, a method of how to adjust the FLP tendency of major association measures such as cosine coefficient is proposed. This method is tested on the cooccurrence-based query expansion in information retrieval and the result can be regarded as promising the usefulness of the method. Based on these results of analysis and experiment, implications for related disciplines are identified.

4

문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구

최상희(대구가톨릭대학교) ; 이재윤(경기대학교) 2012, Vol.29, No.1, pp.331-349 https://doi.org/10.3743/KOSIM.2012.29.1.331

초록보기

초록

구조적 초록은 학술 논문의 주제를 표현하는 역할을 하여 학술 논문을 처리하는데 중요한 요소로 인식되어왔다. 이 연구에서는 구조적 초록을 구성하는 세부 필드의 속성을 4개로 분석하고 초록의 구조를 활용하여 문서 클러스터링에 적용할 수 있는 가능성을 고찰고자 하였다. 구조적 초록의 필드 속성을 문서 클러스터링에 적용한 결과 클러스터링 기법간의 편차가 있었으나 연구 목적이 제공하는 정보량에 비해 주제성이 커서 클러스터링 성능에 가장 큰 영향을 미치고 있는 것으로 나타났다. 또한 분석 결과 특정 필드에 특화되어 출현하는 필드 종속적인 단어가 발생하는 것으로 나타나 필드 종속적인 단어를 배제하고 집단내 평균연결 기법을 적용하였을 때는 클러스터링의 성능이 개선되는 것으로 분석되었다.

Abstract

Structured abstracts have been regarded as an essential information factor to represent topics of journal articles. This study aims to provide an unconventional view to utilize structured abstracts with the analysis on sub fields of a structured abstract in depth. In this study, a structured abstract was segmented into four fields, namely, purpose, design, findings, and values/implications. Each field was compared in the performance analysis of document clustering. In result, the purpose statement of an abstract affected on the performance of journal article clustering more than any other fields. Furthermore, certain types of keywords were identified to be excluded in the document clustering to improve clustering performance, especially by Within group average clustering method. These keywords had stronger relationship to a specific abstract field such as research design than the topic of an article.

5

문헌간 유사도를 이용한 자동분류에서 미분류 문헌의 활용에 관한 연구

김판준(신라대학교) ; 이재윤(경기대학교) 2007, Vol.24, No.1, pp.251-271 https://doi.org/10.3743/KOSIM.2007.24.1.251

초록보기

초록

문헌간 유사도를 자질로 사용하는 분류기에서 미분류 문헌을 학습에 활용하여 분류 성능을 높이는 방안을 모색해보았다. 자동분류를 위해서 다량의 학습문헌을 수작업으로 확보하는 것은 많은 비용이 들기 때문에 미분류 문헌의 활용은 실용적인 면에서 중요하다. 미분류 문헌을 활용하는 준지도학습 알고리즘은 대부분 수작업으로 분류된 문헌을 학습데이터로 삼아서 미분류 문헌을 분류하는 첫 번째 단계와, 수작업으로 분류된 문헌과 자동으로 분류된 문헌을 모두 학습 데이터로 삼아서 분류기를 학습시키는 두 번째 단계로 구성된다. 이 논문에서는 문헌간 유사도 자질을 적용하는 상황을 고려하여 두 가지 준지도학습 알고리즘을 검토하였다. 이중에서 1단계 준지도학습 방식은 미분류 문헌을 문헌유사도 자질 생성에만 활용하므로 간단하며, 2단계 준지도학습 방식은 미분류 문헌을 문헌유사도 자질 생성과 함께 학습 예제로도 활용하는 알고리즘이다. 지지벡터기계와 나이브베이즈 분류기를 이용한 실험 결과, 두 가지 준지도학습 방식 모두 미분류 문헌을 활용하지 않는 지도학습 방식보다 높은 성능을 보이는 것으로 나타났다. 특히 실행효율을 고려한다면 제안된 1단계 준지도학습 방식이 미분류 문헌을 활용하여 분류 성능을 높일 수 있는 좋은 방안이라는 결론을 얻었다

Abstract

This paper studies the problem of classifying documents with labeled and unlabeled learning data, especially with regards to using document similarity features. The problem of using unlabeled data is practically important because in many information systems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. There are two steps in general semi-supervised learning algorithm. First, it trains a classifier using the available labeled documents, and classifies the unlabeled documents. Then, it trains a new classifier using all the training documents which were labeled either manually or automatically. We suggested two types of semi-supervised learning algorithm with regards to using document similarity features. The one is one step semi-supervised learning which is using unlabeled documents only to generate document similarity features. And the other is two step semi-supervised learning which is using unlabeled documents as learning examples as well as similarity features. Experimental results, obtained using support vector machines and naive Bayes classifier, show that we can get improved performance with small labeled and large unlabeled documents then the performance of supervised learning which uses labeled-only data. When considering the efficiency of a classifier system, the one step semi-supervised learning algorithm which is suggested in this study could be a good solution for improving classification performance with unlabeled documents.

6

학제적 분야의 정보서비스를 위한 학술지 인용 분석에 관한 연구: Y대학교 생명공학과를 중심으로

유소영(연세대학교) ; 이재윤(경기대학교) 2008, Vol.25, No.4, pp.283-308 https://doi.org/10.3743/KOSIM.2008.25.4.283

초록보기

초록

이 연구에서는 자관의 학술지 상호인용 및 동시인용 분석을 통하여 단순 피인용빈도 이상의 학술지 인용 패턴 분석을 시도 하였다. 이 연구를 통해 학술지의 중요도 파악에 있어서 자관 인용 네트워크의 구조적 분석이 인용빈도 이상의 자관 인용 패턴에 대한 설명을 하고 있는지와, Web of Science에서 제공하는 JIF 이외의 일반적 인용 지수 서비스들을 고려해야 할 필요성이 있는지를 살펴보았다. Y대학교 생명시스템대학 생명공학과 전·현직 교수진이 2006년과 2007년에 발표한 학술논문의 인용 네트워크 분석 및 Web of Science 이외의 일반적 인용 지수들간의 관계를 분석한 결과는 다음과 같다. 첫째, 자관의 상호인용 네트워크를 통해 자관의 연구 분야를 확인할 수 있었다. 둘째, 자관의 동시인용 네트워크 지수들은 자관 인용 네트워크의 구조적 속성을 반영하는 인용 패턴의 설명이 가능하며 이는 피인용빈도와 유사하면서도 추가적인 설명력을 가지는 것을 확인하였다. 셋째, 일반적 인용지수로는 JIF 외에도 합산지향지수, h-index와 같은 다양한 일반적 인용 지수들의 설명력이 다양하므로 이를 이용하여 다각적으로 고려하는 것이 필요한 것으로 파악되었다. 또한 학술지 평가에서 인용 색인 데이터베이스의 수록범위보다는 지수의 유형에 따른 설명력 차이가 크다는 것을 확인하였다. 이와 같은 자관의 인용 네트워크 분석은 정보서비스의 여러 분야에서 유용하게 사용될 수 있을 것으로 기대된다.

Abstract

In this study, we testify that network structural attributes of a citation network can explain other aspects of journal citation behaviors and the importances of journals. And we also testify various citation impact indicators of journals including JIF and h-index to verify the difference among them especially focused on their ability to explain an institution's local features of citation behaviors. An institutional citation network is derived using the articles published in 2006-2007 by biotechnology faculties of Y University. And various journal citation impact indicators including JIF, SJR, h-index, EigenFactor, JII are gathered from different service sites such as Web of Science, SCImago, EigenFactor.com, Journal-Ranking.com. As a results, we can explain the institution's 5 research domains with inter-citation network. And we find that the co-citation network structural features can show explanations on the patterns of institutional journal citation behavior different from the simple cited frequency of the institution or patterns based on general citation indicators. Also We find that journal ranks with various citation indicators have differences and it implies that total-based indices, average-based indices, and hybrid index(h-index) explain different aspects of journal citation pattern. We also reveal that the coverage of citation DB doesn't be a matter in the journal ranking. Analyzing the citation networks derived from an institution's research outputs can be a useful and effective method in developing several library services.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지