정보관리학회지, 한국정보관리학회

11

김수연(연세대학교) ; 정영미(연세대학교) 2006, Vol.23, No.3, pp.147-165 https://doi.org/10.3743/KOSIM.2006.23.3.147

초록보기

초록

이 연구에서는 전체 문헌집단으로부터 초기 질의어에 대한 연관용어 선정 시 사용할 수 있는 최적의 기법을 찾기 위해 연관규칙 마이닝과 용어 클러스터링 기법을 이용하여 연관용어 선정 실험을 수행하였다. 연관규칙 마이닝 기법에서는 Apriori 알고리즘을 사용하였으며, 용어 클러스터링 기법에서는 연관성 척도로 GSS 계수, 자카드계수, 코사인계수, 소칼 & 스니스 5, 상호정보량을 사용하였다. 성능평가 척도로는 연관용어 정확률과 연관용어 일치율을 사용하였으며, 실험결과 Apriori 알고리즘과 GSS 계수가 가장 좋은 성능을 나타냈다.

Abstract

In this study, experiments for selection of association terms were conducted in order to discover the optimum method in selecting additional terms that are related to an initial query term. Association term sets were generated by using support, confidence, and lift measures of the Apriori algorithm, and also by using the similarity measures such as GSS, Jaccard coefficient, cosine coefficient, and Sokal & Sneath 5, and mutual information. In performance evaluation of term selection methods, precision of association terms as well as the overlap ratio of association terms and relevant documents' indexing terms were used. It was found that Apriori algorithm and GSS achieved the highest level of performances.

12

독서교육시스템을 위한 텍스트수준 측정 공식 구성에 관한 연구

최인숙(숙명여자대학교) 2005, Vol.22, No.3, pp.213-232 https://doi.org/10.3743/KOSIM.2005.22.3.213

초록보기

초록

본 고의 목적은 초등학생용 독서자료의 텍스트수준에 영향을 미치는 요인들을 규명하여 텍스트수준 적 요인들을 대상으로 표본집단에 부여된 텍스트수준 점수와의 상관관계를 검토한 결과 글자수, 어절수, 이형어절수, 문장수, 단락수 요인이 텍스트수준을 결정하는 요인으로 드러났다. 단순회귀분석을 통해 도출된 회귀방정식들 중에서 이형어절수 모형이 최적의 공식으로 드러났으나, 중회귀분석을 한 결과 이형어절수요인과 새어절출현비율요인을 결합한 모형은 설명력이 더욱 향상된 공식으로 밝혀졌다. 공식에 따라 독서능력에 적합한 자료를 추천할 수 있다.

Abstract

The purpose of this study is to determine factors affecting text difficulty and to model objective formulas which measure readability scores. Some reada total number of letters, total number of syllables, total numbe r of unique syllables, total number of sentences and total number of paragraphs were found through regression equations with these factors as their variables were produced through regression analysis. A model estimating rea dability score from total numbe r of unique syllables was a nique syllables and new syllable occurrence ratio, was a better enhanced one. The readability sc ore represents detailed level so we can recomend students read texts corresponding to their reading levels.

13

뉴스 웹 페이지에서 기사 본문 추출에 관한 연구

이용구(피츠버그대학) 2009, Vol.26, No.1, pp.305-320 https://doi.org/10.3743/KOSIM.2009.26.1.305

초록보기

초록

웹을 통해 제공되는 뉴스 페이지의 경우 필요한 정보 뿐 아니라 많은 불필요한 정보를 담고 있다. 이러한 불필요한 정보는 뉴스를 처리하는 시스템의 성능 저하와 비효율성을 가져온다. 이 연구에서는 웹 페이지로부터 뉴스 콘텐츠를 추출하기 위해 문장과 블록에 기반한 뉴스 기사 추출 방법을 제시하였다. 또한 이들을 결합하여 최적의 성능을 가져올 수 있는 방안을 모색하였다. 실험 결과, 웹 페이지에 대해 하이퍼링크 텍스트를 제거한 후 문장을 이용한 추출 방법을 적용하였을 때 효과적이었으며, 여기에 블록을 이용한 추출 방법과 결합하였을 때 더 좋은 결과를 가져왔다. 문장을 이용한 추출 방법은 추출 재현율을 높여주는 효과가 있는 것으로 나타났다.

Abstract

The news pages provided through the web contain unnecessary information. This causes low performance and inefficiency of the news processing system. In this study, news content extraction methods, which are based on sentence identification and block-level tags news web pages, was suggested. To obtain optimal performance, combinations of these methods were applied. The results showed good performance when using an extraction method which applied the sentence identification and eliminated hyperlink text from web pages. Moreover, this method showed better results when combined with the extraction method which used block-level. Extraction methods, which used sentence identification, were effective for raising the extraction recall ratio.

14

웹기반 정보검색시스템의 검색관련 용어 표준에 관한 연구

남영준(중앙대학교) 2003, Vol.20, No.2, pp.199-217 https://doi.org/10.3743/KOSIM.2003.20.2.199

초록보기

초록

본 연구에서는 웹기반 정보검색시스템을 사용함에 있어 이용자 편의성을 최적화할 수 있는 검색 인터페이스 표준 용어를 제안하였다. 이를 위해 국립중앙도서관을 비롯하여 주요 전문 정보를 제공하고 있는 기관의 웹페이지를 조사. 분석하였다. 분석한 결과에 근거하여 웹기반 정보검색시스템에서 사용자 오류와 혼란을 최소화하고 검색 편의성을 극대화할 수 있는 표준 용어를 제안하였다. 제안의 기준은 해당 용어의 사용빈도와 의미를 활용하였다. 분석은 검색관련 기본 모듈을 비롯하여 검색범위설정 모듈, 이용자 지원 모듈에서 사용된 용어 가운대 최소 50%이상의 기관에서 제공하는 기능에 존재하는 용어만을 대상으로 하였다. 본 연구의 결과는 웹 기반 검색화면 설계 및 구축 전문가에게 검색 관련 용어선정을 위한 표준 자료로 활용될 것이다.

Abstract

This research suggesrs the method of standardizing terms for raising the dffectiveness of information retrival. Especiallly for web search, I propose the proper terms which they will use in retriveal by surveying and analysing the related terms abour information retrieval interface. The proper terms will solve the eqyivocaiton for user and increase the retrieval effectiveness. And I think the proposed terms will be used to standard data for designers who are construct the user interface systems.

15

공공도서관 지적자본 평가지표와 성과의 인과관계 연구

박성우(전남대학교) ; 장우권(전남대학교) 2011, Vol.28, No.4, pp.279-307 https://doi.org/10.3743/KOSIM.2011.28.4.279

초록보기

초록

공공도서관의 경쟁우위와 지속성 유지를 위한 내적 원동력은 지적자본이다. 지적자본은 내부구성원의 역량, 구성원들이 형성하는 조직구조, 이용자 및 이해관계자의 협력으로 맺어지는 인적자본, 구조자본, 사회자본으로 구성된다. 이 연구의 목적은 지적자본 구성요소인 인적자본, 구조자본, 사회자본에 관한 배경이론을 토대로 공공도서관 지적자본 평가를 위한 기초자료를 제공하고 동시에 실험적 평가모형을 개발하는 데 있다. 이를 위해 지적자본의 배경이론을 토대로 공공도서관의 지적자본의 해석과 평가지표를 도출하였다. 또한 실증적 분석을 통해 지적자본 구성요소와 성과의 인과관계를 밝히고, 최적화된 지적자본 평가모형을 제시하였다.

Abstract

Intellectual capital is the driving force for the competitive advantage and durability of the public library. This asset consists of the library members’ competences, the organizational structure constructed by the members, and the interrelationships among the people sharing the same interests. These are called human capital, structural capital and social capital in the respective order. The purpose of the study was to provide foundational information for the public library’s intellectual capital assessment as well as creating an experimental assessment model. It analysed the three characteristics of the capital, which generated an assessment index. In addition, it identified the relationship between the components of the intellectual capital and performance were discovered through empirical study to improve the assessment system.

16

디스크립터 자동 할당을 위한 저자키워드의 재분류에 관한 실험적 연구

김판준(신라대학교) ; 이재윤(경기대학교) 2012, Vol.29, No.2, pp.225-246 https://doi.org/10.3743/KOSIM.2012.29.2.225

초록보기

초록

본 연구는 국내 주요 학술 DB의 검색서비스에서 제공되고 있는 저자키워드(비통제키워드)의 재분류를 통하여 디스크립터(통제키워드)를 자동 할당할 수 있는 가능성을 모색하였다. 먼저 기계학습에 기반한 주요 분류기들의 특성을 비교하는 실험을 수행하여 재분류를 위한 최적 분류기와 파라미터를 선정하였다. 다음으로, 국내 독서 분야 학술지 논문들에 부여된 저자키워드를 학습한 결과에 따라 해당 논문들을 재분류함으로써 키워드를 추가로 할당하는 실험을 수행하였다. 또한 이러한 재분류 결과에 따라 새롭게 추가된 문헌들에 대하여 통제키워드인 디스크립터와 마찬가지로 동일 주제의 논문들을 모아주는 어휘통제 효과가 있는지를 살펴보았다. 그 결과, 저자키워드의 재분류를 통하여 디스크립터를 자동 할당하는 효과를 얻을 수 있음을 확인하였다.

Abstract

This study purported to investigate the possibility of automatic descriptor assignment using the reclassification of author keywords in domestic scholarly databases. In the first stage, we selected optimal classifiers and parameters for the reclassification by comparing the characteristics of machine learning classifiers. In the next stage, learning the author keywords that were assigned to the selected articles on readings, the author keywords were automatically added to another set of relevant articles. We examined whether the author keyword reclassifications had the effect of vocabulary control just as descriptors collocate the documents on the same topic. The results showed the author keyword reclassification had the capability of the automatic descriptor assignment.

17

랜덤포레스트를 이용한 국내 학술지 논문의 자동분류에 관한 연구

김판준(신라대학교) 2019, Vol.36, No.2, pp.57-77 https://doi.org/10.3743/KOSIM.2019.36.2.057

초록보기

초록

대표적인 앙상블 기법으로서 랜덤포레스트(RF)를 문헌정보학 분야의 학술지 논문에 대한 자동분류에 적용하였다. 특히, 국내 학술지 논문에 주제 범주를 자동 할당하는 분류 성능 측면에서 트리 수, 자질선정, 학습집합 크기 등 주요 요소들에 대한 다각적인 실험을 수행하였다. 이를 통해, 실제 환경의 불균형 데이터세트(imbalanced dataset)에 대하여 랜덤포레스트(RF)의 성능을 최적화할 수 있는 방안을 모색하였다. 결과적으로 국내 학술지 논문의 자동분류에서 랜덤포레스트(RF)는 트리 수 구간 100〜1000(C)과 카이제곱통계량(CHI)으로 선정한 소규모의 자질집합(10%), 대부분의 학습집합(9〜10년)을 사용하는 경우에 가장 좋은 분류 성능을 기대할 수 있는 것으로 나타났다.

Abstract

Random Forest (RF), a representative ensemble technique, was applied to automatic classification of journal articles in the field of library and information science. Especially, I performed various experiments on the main factors such as tree number, feature selection, and learning set size in terms of classification performance that automatically assigns class labels to domestic journals. Through this, I explored ways to optimize the performance of random forests (RF) for imbalanced datasets in real environments. Consequently, for the automatic classification of domestic journal articles, Random Forest (RF) can be expected to have the best classification performance when using tree number interval 100〜1000(C), small feature set (10%) based on chi-square statistic (CHI), and most learning sets (9-10 years).

18

문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구

정은경(이화여자대학교) 2009, Vol.26, No.3, pp.261-278 https://doi.org/10.3743/KOSIM.2009.26.3.261

초록보기

초록

기계학습 기반 문서범주화 기법에 있어서 최적의 자질을 구성하는 것이 성능향상에 있어서 중요하다. 본 연구는 학술지 수록 논문의 필수적 구성요소인 저자 제공 키워드와 논문제목을 대상으로 자질확장에 관한 실험을 수행하였다. 자질확장은 기본적으로 선정된 자질에 기반하여 WordNet과 같은 의미기반 사전 도구를 활용하는 것이 일반적이다. 본 연구는 키워드와 논문제목을 대상으로 WordNet 동의어 관계 용어를 활용하여 자질확장을 수행하였으며, 실험 결과 문서범주화 성능이 자질확장을 적용하지 않은 결과와 비교하여 월등히 향상됨을 보여주었다. 이러한 성능향상에 긍정적인 영향을 미치는 요소로 파악된 것은 정제된 자질 기반 및 분류어 기준의 동의어 자질확장이다. 이때 용어의 중의성 해소 적용과 비적용 모두 성능향상에 영향을 미친 것으로 파악되었다. 본 연구의 결과로 키워드와 논문제목을 활용한 분류어 기준 동의어 자질 확장은 문서 범주화 성능향상에 긍정적인 요소라는 것을 제시하였다.

Abstract

Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

19

OPAC에서 자동분류 열람을 위한 계층 클러스터링 연구

노정순(한남대학교) 2004, Vol.21, No.1, pp.93-117 https://doi.org/10.3743/KOSIM.2004.21.1.093

초록보기

초록

본 연구는 OPAC에서 계층 클러스터링을 응용하여 소장자료를 계층구조로 분류하여 열람하는데 사용될 수 있는 최적의 계층 클러스터링 모형을 찾기 위한 목적으로 수행되었다. 문헌정보학 분야 단행본과 학위논문으로 실험집단을 구축하여 다양한 색인기법(서명단어 자동색인과 통제어 통합색인)과 용어가중치 기법(절대빈도와 이진빈도), 유사도 계수(다이스, 자카드, 피어슨, 코싸인, 제곱 유클리드), 클러스터링 기법(집단간 평균연결, 집단내 평균연결, 완전연결)을 변수로 실험하였다. 연구결과 집단간 평균연결법과 제곱 유클리드 유사도를 제외하고 나머지 유사도 계수와 클러스터링 기법은 비교적 우수한 클러스터를 생성하였으나, 통제어 통합색인을 이진빈도로 가중치를 부여하여 완전연결법과 집단간 평균연결법으로 클러스터링 하였을 때 가장 좋은 클러스터가 생성되었다. 그러나 자카드 유사도 계수를 사용한 집단간 평균연결법이 십진구조와 더 유사하였다.

Abstract

This study is to develop a hiararchic clustering model for document classification and browsing in OPAC systems. Two automatic indexing techniques (with and without controlled terms), two term weighting methods (based on term frequency and binary weight), five similarity coefficients (Dice, Jaccard, Pearson, Cosine, and Squared Euclidean), and three hierarchic clustering algorithms (Between Average Linkage, Within Average Linkage, and Complete Linkage method) were tested on the document collection of 175 books and theses on library and information science. The best document clusters resulted from the Between Average Linkage or Complete Linkage method with Jaccard or Dice coefficient on the automatic indexing with controlled terms in binary vector. The clusters from Between Average Linkage with Jaccard has more likely decimal classification structure.

20

기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구

김판준(신라대학교) 2018, Vol.35, No.2, pp.37-62 https://doi.org/10.3743/KOSIM.2018.35.2.037

초록보기

초록

문헌정보학 분야의 국내 학술지 논문으로 구성된 문헌집합을 대상으로 기계학습에 기초한 자동분류의 성능에 영향을 미치는 요소들을 검토하였다. 특히, 「정보관리학회지」에 수록된 논문에 주제 범주를 자동 할당하는 분류 성능 측면에서 용어 가중치부여 기법, 학습집합 크기, 분류 알고리즘, 범주 할당 방법 등 주요 요소들의 특성을 다각적인 실험을 통해 살펴보았다. 결과적으로 분류 환경 및 문헌집합의 특성에 따라 각 요소를 적절하게 적용하는 것이 효과적이며, 보다 단순한 모델의 사용으로 상당히 좋은 수준의 성능을 도출할 수 있었다. 또한, 국내 학술지 논문의 분류는 특정 논문에 하나 이상의 범주를 할당하는 복수-범주 분류(multi-label classification)가 실제 환경에 부합한다고 할 수 있다. 따라서 이러한 환경을 고려하여 단순하고 빠른 분류 알고리즘과 소규모의 학습집합을 사용하는 최적의 분류 모델을 제안하였다.

Abstract

This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in 「Journal of the Korean Society for Information Management」, I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지