정보관리학회지, 한국정보관리학회

11

정은경(이화여자대학교 사회과학대학 문헌정보학과 교수) 2020, Vol.37, No.1, pp.153-177 https://doi.org/10.3743/KOSIM.2020.37.1.153

초록보기

초록

오픈과학의 흐름에서 데이터 공유와 재이용은 중요한 연구자의 활동이 되어가고 있다. 데이터 공유와 재이용에 관한 여러 논의 중에서 데이터학술지와 데이터논문의 발간이 가시적인 결과를 보여주고 있다. 데이터학술지는 여러 학문 분야에서 발간되고 있으며, 논문의 수도 점차 증가하고 있다. 데이터논문은 데이터 자체와는 다르게 인용을 주고 받는 활동이 포함되어, 따라서 이들이 형성하는 고유한 지적구조가 생겨나게 된다. 본 연구는 데이터학술지와 데이터논문이 학술커뮤니티에서 구성하는 지적구조를 규명하고자 Web of Science에 색인된 14종의 데이터학술지와 6,086건의 데이터논문과 인용된 참고문헌 84,908건을 분석하였다. 저자사항과 함께 동시인용분석과 서지결합분석을 네트워크로 시각화하여 데이터논문이 형성한 세부 주제 분야를 규명하였다. 분석결과, 저자, 저자소속기관, 국가를 추출하여 출현빈도를 살펴보면, 전통적인 학술지 논문과 다른 양상을 보인다. 이러한 결과는 데이터의 생산이 용이한 기관과 국가에 주로 데이터논문을 출간하기 때문이라고 해석될 수 있다. 동시인용분석와 서지결합분석 모두 분석도구, 데이터베이스, 게놈구성 등이 주된 세부 주제 영역으로 나타났다. 동시인용분석결과는 9개의 군집으로 형성되었는데, 특정 주제 분야로 나타난 영역은 수질과 기후 등의 분야이다. 서지결합분석은 총 27개의 컴포넌트로 구성되었는데, 수질, 기후 이 외에도 해양, 대기 등의 세부 주제 영역이 파악되었다. 특기할만한 사항으로는 사회과학 분야의 주제 영역도 나타났다는 점이다.

Abstract

In the context of open science, data sharing and reuse are becoming important researchers’ activities. Among the discussions about data sharing and reuse, data journals and data papers shows visible results. Data journals are published in many academic fields, and the number of papers is increasing. Unlike the data itself, data papers contain activities that cite and receive citations, thus creating their own intellectual structures. This study analyzed 14 data journals indexed by Web of Science, 6,086 data papers and 84,908 cited references to examine the intellectual structure of data journals and data papers in academic community. Along with the author’s details, the co-citation analysis and bibliographic coupling analysis were visualized in network to identify the detailed subject areas. The results of the analysis show that the frequent authors, affiliated institutions, and countries are different from that of traditional journal papers. These results can be interpreted as mainly because the authors who can easily produce data publish data papers. In both co-citation and bibliographic analysis, analytical tools, databases, and genome composition were the main subtopic areas. The co-citation analysis resulted in nine clusters, with specific subject areas being water quality and climate. The bibliographic analysis consisted of a total of 27 components, and detailed subject areas such as ocean and atmosphere were identified in addition to water quality and climate. Notably, the subject areas of the social sciences have also emerged.

12

웹 2.0 기반 생명과학 오픈 아카이빙 커뮤니티 구축

안부영(한국과학기술정보연구원) ; 이응봉(충남대학교) ; 한정민(KISTI) 2006, Vol.23, No.4, pp.89-110 https://doi.org/10.3743/KOSIM.2006.23.4.089

초록보기

초록

생명과학은 인간이 살아가는데 있어 직접적으로 영향을 미치는 중요한 학문분야 중 하나이다. 국내 생명과학 관련 연구자들은 산학연에 흩어져 중요한 연구를 수행하고 있으며, 이를 통한 연구결과는 다양한 형태(실질적인 연구결과물, 논문, 연구노트, 세미나 자료, 단행본, 교재 등)로 생산되고 있다. KISTI에서는 생명과학 관련 연구정보의 신속한 획득을 위해 생명과학관련 정보를 공유하고 교환할 수 있는 오픈 아카이빙 커뮤니티 (BioInfoNet)를 구축하여 연구자들이 커뮤니티를 발전시켜 가도록 인프라를 제공하고 있다. 본 연구에서는 최근 플랫폼으로서의 웹인 웹 2.0을 기반으로 오픈 액세스가 가능한 생명과학 문헌정보를 수집하여 메타 데이터베이스를 구축하였으며, 이용자들이 자발적으로 주제별 공개 BBS(BioBBS)를 구성하고 운영할 수 있도록 커뮤니티를 설계하고 구현하였다.

Abstract

Life science is one of the most important fields which have direct influence on human life. Many domestic life scientists in the industries, educational organizations and research institutes have been producing important results in a variety of forms such as papers, research notes, presentation materials, books and teaching materials. Open Archiving Community has been constructed in order to share and exchange research information related to life science between researchers. The domestic life scientists can acquire valuable information through the community quickly and efficiently. In this study, the community system has been designed and implemented to provide free access to all data including metadata registry of the bibliographic information on life science and research results accumulated by researchers of their own accord. The community system also has been designed and implemented based on Web 2.0 and provides users with BBS by subjects.

13

국내 광역 과학 지도 생성 연구

이재윤(경기대학교) 2007, Vol.24, No.3, pp.363-383 https://doi.org/10.3743/KOSIM.2007.24.3.363

초록보기

초록

전 학문 분야를 포괄하는 광역 과학 지도는 학문 분야 사이의 구조적인 관계를 시각적으로 분석하는데 사용되고 있다. 이 연구에서는 광역 과학 지도에 대한 선행 연구를 개관한 후 새로운 방법으로 국내의 학술 활동을 반영하는 광역 과학 지도를 생성하였다. 광역 과학 지도에 대한 연구는 ISI사(현재 Thomson Scientific)의 Garfield와 Small에 의해서 촉발되었고 최근에는 스페인 그라나다 대학의 SCImago 연구팀과 미국 인디애나 대학의 Brner 교수팀이 활발히 연구 결과를 발표하고 있다. 이들은 자신들이 만들어 발표하고 있는 지도를 과학 지도 또는 사이언토그램이라고 부르며, 이에 관련된 활동을 과학지도학(scientography)이라고 하였다. 기존의 광역 과학 지도는 대부분 학술 논문 사이의 인용 분석에 근거하여 제작되었으나, 국내 학술 논문에 대한 인용 데이터베이스는 아직 미비한 상태이다. 따라서 이 연구에서는 국내의 광역 과학 지도를 만들기 위해서 학술진흥재단에 신청된 과제 제안서의 텍스트를 활용하였다. 학문 분야 사이의 연결 정보를 네트워크로 표현하는 수단으로 널리 사용되고 있는 패스파인더 네트워크(PFNet) 알고리즘으로 광역 과학 지도를 생성한 후, 이의 대안으로 개발된 클러스터링 기반 네트워크(CBNet) 알고리즘으로 다시 지도를 생성하였다. 최종적으로 두 지도에 나타난 상반된 관점을 통합하도록 CBNet 지도를 수정하여 국내 광역 과학 지도를 제시하였다.

Abstract

Global map of science, which is visualizing large scientific domains, can be used to visually analyze the structural relationships between major areas of science. This paper reviewed previous efforts on global science map, and then tried to making a science map of Korea with some new methods. There are several research groups on making global map of science including Dr. Small and Dr. Garfield of ISI (now Thompson Scientific), SCImago research group at the University of Granada, and Dr. Brner's InfoVis Lab at the Indiana University. They called their maps as science map or scientogram and called the activity of mapping science as scientography. Most of the previous works are based on citations between scientific articles. However citation database for Korean journal articles is still under construction. This research tried to make a Korean science map with the text in the proposals suggested for funding from Korean Research Foundation. Two kinds of method for generating networks of scientific fields are used. One is Pathfinder network (PFNet) alogorithm which has been used in several published bibliometric studies. The other is clustering-based network (CBnet) algorithm which was proposed recently as an alternative to PFNet. In order to take into account both views of the two algorithms, the resulting maps are combined to a final science map of Korea.

14

국내 참고문헌 데이터베이스 운영현황 및 실태에 관한 분석

김홍렬(전주대학교) ; 정경희(충북대학교) 2005, Vol.22, No.2, pp.23-39 https://doi.org/10.3743/KOSIM.2005.22.2.023

초록보기

초록

본 연구의 목적은 국내에서 구축되고 있는 참고문헌 데이터베이스의 현황을 분석하여 문제점을 밝히고, 이들 문제점을 토대로 정보원으로서의 기능과 평가도구로서의 기능을 적절하게 수행할 수 있는 참고문헌 데이터베이스를 구축하기 위한 활성화 방안을 제언하는데 있다. 이를 위하여 국내에서 구축되는 참고문헌 데이터베이스 가운데 전문연구정보센터에서 구축하는 참고문헌 DB, 한국과학기술정보연구원의 KSCI, 학술진흥재단의 KCI, 대한의학회의 KoMCI를 연구대상으로 분석하였다. 이를 근거로 국내 참고문헌 데이터베이스 구축 사업을 위한 방안을 도출하여 제시하였다. 이들 자료는 참고문헌DB 구축을 위한 제도 및 정책, 기술적 방향을 제시하는데 근거자료로 활용할 수 있을 것이다.

Abstract

The purpose of this study was to analyzes the present conditions of reference databases which is constructed from Korea. The object which is used in analysis concludes on KSCI(Korean Science Citation Index) of KISTI, KCI(Korean Citation Index) of KRF, KoMCI(Korean Medical Citation Index) of Korean Academy of Medical Sciences, and reference database of KOSEF. And then, this paper proposes the activation plan for reference database construction based on this analysis result. The proposed plan will be able to apply with fundamental data of the system, policy and technical direction for reference database construction.

15

디지털 시대 오픈 데이터 정책의 현황과 과제

신은자(세종대학교) 2015, Vol.32, No.3, pp.49-68 https://doi.org/10.3743/KOSIM.2015.32.3.049

초록보기

초록

과거에는 오픈 데이터에 공감한다 하더라도 이를 실천할 방법이 마땅하지 않았으나 요즈음은 디지털 형태의 연구데이터를 IT를 통해 공유하는 것이 어렵지 않은 상황이 되었다. 그러나 많은 연구자가 오픈 데이터를 시행하였을 때의 부작용과 추가 작업에 대한 부담을 느끼고 있고 이외 해결하여야 할 문제도 다소 있어, 오픈 데이터는 현재 기대만큼 활발히 수행되고 있지는 않다. 지구과학, 기상학 등 일부 학문 분야에서 활발하게 추진되고 있을 뿐 나머지 학문 분야에서는 오픈 데이터에 대하여 큰 관심을 보이지 않는 듯하다. 연구결과 해외의 학회, 비영리단체, 대학, 연구지원기관에서는 오픈 데이터를 공공의 이익 추구 차원에서, 주요 출판사에서는 오픈 데이터를 논문을 엄격하게 심사하기 위한 보완책 차원에서 추진하고 있었다. 오픈 데이터는 후속 연구를 이끌고 학문을 발전시키는 발판 역할을 한다는 점에서 중요하고 앞으로 나아가야 할 방향이라는 것은 분명해 보인다. 따라서, 국내에서도 해외의 사례를 충분히 고찰하고 정책에 반영함은 물론이려니와, 연구자, 대학, 도서관 모두 오픈 데이터의 필요성과 향후 전개될 상황에 관하여 관심을 갖고 보다 적극적으로 협력하여야 할 것이며, 이 연구는 이에 관한 구체적인 내용을 기술하였다.

Abstract

There were not many ways to share research data in the past, but modern information technology has allowed us to share these data. As data sharing has its side effects, researchers’ attitude and practice to sharing data vary by individual discipline. This study found that foreign learned societies, NGOs, universities and research funders support data sharing in a utilitarian perspective, while major publishers demand it so that other researchers can verify the data in peer review. It is important that open data policy should be settled down in near future for evoking further studies and encouraging progress in science. In order to establish data sharing successfully in Korea, efforts could be made by researchers, universities, academic libraries, and governments as well as the stakeholder. This study also proposed specific ways to perform it.

16

Scientific Data 학술지 분석을 통한 데이터 논문 현황에 관한 연구

정은경(이화여자대학교) 2019, Vol.36, No.1, pp.117-135 https://doi.org/10.3743/KOSIM.2019.36.1.117

초록보기

초록

데이터 학술지와 데이터 논문이 오픈과학 패러다임에서 데이터 공유와 재이용이라는 학술활동이 등장하여 지속적으로 성장하고 있다. 본 논문은 영향력있는 다학제적 분야의 데이터 학술지인 Scientific Data에 게제된 총 713건의 논문을 대상으로 저자, 인용, 주제분야 측면을 분석하였다. 그 결과 저자의 주된 주제 영역은 생명공학, 물리학 등으로 나타났으며, 공저자 수는 평균 12명이다. 공저 형태를 네트워크로 살펴보면, 특정 연구자 그룹이 패쇄적으로 공저활동을 수행하는 것으로 나타났다. 인용의 주제영역을 살펴보면, 데이터 논문 저자의 주제영역과 크게 다르지 않게 나타났으나, 방법론을 주로 다루는 학술지의 인용 비중이 높은 것은 데이터 논문의 특징으로 볼 수 있다. 데이터 논문 저자의 키워드를 사용하여 동시출현단어분석 네트워크로 살펴본 데이터 논문의 주제영역은 생물학이 중심이며, 구체적으로 해양생태, 암, 게놈, 데이터베이스, 기온 등의 세부 주제 영역을 확인할 수 있다. 이러한 결과는 다학제학문 분야를 다루는 데이터 학술지이지만, 데이터 학술지 출간에 관한 논의를 일찍부터 시작해온 생명공학 분야에 집중된 현상을 보여준다.

Abstract

Data journals and data papers have grown and considered an important scholarly practice in the paradigm of open science in the context of data sharing and data reuse. This study investigates a total of 713 data papers published in Scientific Data in terms of author, citation, and subject areas. The findings of the study show that the subject areas of core authors are found as the areas of Biotechnology and Physics. An average number of co-authors is 12 and the patterns of co-authorship are recognized as several closed sub-networks. In terms of citation status, the subject areas of cited publications are highly similar to the areas of data paper authors. However, the citation analysis indicates that there are considerable citations on the journals specialized on methodology. The network with authors’ keywords identifies more detailed areas such as marine ecology, cancer, genome, database, and temperature. This result indicates that biology oriented-subjects are primary areas in the journal although Scientific Data is categorized in multidisciplinary science in Web of Science database.

17

국가연구데이터커먼즈 체계 수립을 위한 연구데이터 관리자들의 인식에 관한 연구

박성은(한국과학기술정보연구원 연구데이터공유센터 선임기술원) ; 이미경(한국과학기술정보연구원 연구데이터공유센터 책임연구원) ; 조민희(한국과학기술정보연구원 연구데이터공유센터 책임연구원) ; 송사광(한국과학기술정보연구원 연구데이터공유센터 책임연구원, UST 응용AI학과 교수) ; 김다솔(한국과학기술정보연구원 연구데이터공유센터 기술원) ; 임형준(한국과학기술정보연구원 연구데이터공유센터 센터장) 2024, Vol.41, No.1, pp.465-486 https://doi.org/10.3743/KOSIM.2024.41.1.465

초록보기

초록

본 연구는 한국과학기술정보연구원(KISTI)에서 개발하고 있는 국가연구데이터커먼즈(KRDC)를 실제 이용할 국가과학기술연구회(NST) 산하 정부출연연구기관의 연구데이터 관리자를 대상으로 연구데이터를 분석하기 위한 인프라와 서비스의 현황을 파악하고, KRDC 체계 구축과 관련한 연구데이터 관리자들의 인식을 조사하는 것을 목적으로 하였다. 이를 위해 KISTI를 제외한 24개의 정부출연연구기관을 대상으로 설문을 실시하였으며, 설문조사에 응답한 15개 기관 중 후속 인터뷰에 동의한 9개 기관의 연구데이터 관리자를 대상으로 인터뷰를 수행하였다. 설문 결과, 대부분의 기관들이 관련 서비스를 제공하고 있었으며, 연구데이터 활용을 위한 통합 분석 프레임워크의 도입과 외부에 공개된 분석 SW를 사용할 수 있는 체제에 대한 제공 의향 역시 높은 것으로 나타났다. 한편 후속 인터뷰를 통해 각 기관별로 제공하는 분석 서비스의 외부 공개 현황을 파악해보았을 때, 매우 소수의 기관만이 이를 외부에 공개하고 있었다. 이러한 연구 결과를 분석해보면, 프레임워크를 통해 분석 인프라와 서비스가 제공될 경우 활용하고자 하는 수요가 있으나, 각 기관에서 보유하고 있는 분석 자원을 공개 및 공유하기 어렵다는 것을 알 수 있다. KRDC 체계 구축을 위해서는 연구 현장에서의 분석 인프라와 분석 서비스의 공유가 필수적인 만큼 연구 현장에서의 인식 전환, 나아가 제도적 변화가 필요하며, 후속 인터뷰에서 제시된 시스템의 편리성, 보안, 보상체계 등을 잘 고려하는 정책을 수립하기 위해 노력할 필요가 있다.

Abstract

The purpose of this study is to identify the current status of infrastructure and services for analyzing research data for research data managers at government-funded research institutions under the National Research Council for Science and Technology (NST) who will actually use the Korea Research Data Commons (KRDC), which is being developed by the Korea Institute of Science and Technology Information (KISTI) and to investigate the perceptions of research data managers related to the establishment of KRDC system. For the study, we conducted a survey targeting 24 government-funded research institutes, excluding KISTI, and interviewed research data managers from 9 of the 15 institutions surveyed who agreed to follow-up interviews. As a result of the survey, most institutions were providing related services, and their willingness to introduce an integrated analysis framework for the use of research data and provide a system for using externally released analysis software was also high. Meanwhile, when we investigated the external disclosure status of each institution’s analysis services through follow-up interviews, only a minimal number of institutions were disclosing them to the outside world. The findings reveal that there is a demand to utilize analysis infrastructure and services when provided through the framework. However, it is difficult to disclose and share the analysis resources held by each organization. In order to establish the KRDC system, it is essential to share research sites’ analysis infrastructure and services, and in addition, changes in the perception of research sites and institutional changes are necessary. Furthermore, there is a need to establish policies that consider the system’s convenience, security, and compensation system raised in the follow-up interviews.

18

형사사법정보의 빅데이터 활용방안 연구: 구조화 범주화 관점으로

김미령(서울지방경찰청 사서) ; 노윤주(경찰청 사서) ; 김성훈(성균관대학교 문헌정보학과 초빙교수) 2019, Vol.36, No.4, pp.253-277 https://doi.org/10.3743/KOSIM.2019.36.4.253

초록보기

초록

4차 산업혁명시대를 맞아 데이터의 중요성은 심화되고 있으나, 개인정보보호 등의 문제로 데이터의 활용이 쉽지 않은 경우가 많이 있다. 형사사법정보는 범죄 예측 및 예방, 범죄수사 과학화, 양형합리화 등 다양한 활용가치가 예상됨에도 현재 개인정보보호와 형사사법정보 관련 법률적 해석 문제로 활용이 상당히 제한되고 있다. 본 연구는 형사사법정보의 구조화․범주화를 통해 ‘범죄데이터’로 전환하여 빅데이터로서 활용하도록 제안하였으며, ‘범죄데이터’ 활용시 법률적 문제, 활용가치, 데이터 생성 및 활용시 고려사항을 전문가를 통해 검증하고 향후 전략적 발전방안을 도출하였다. 연구결과, ‘범죄데이터’는 개인정보보호문제는 해결된 것으로 보여지나, 형사사법정보 관련법에 명시할 필요는 있으며, 빅데이터 활용을 위해 분석 가능하도록 표준화된 형태로 정리되는 것이 시급함이 밝혀졌다. 향후 진행방향으로는 데이터 요소 도출, 용어사전 시소러스 구축, 데이터 등급화를 위한 개인민감정보 정의 및 등급지정, 비정형데이터의 정형화를 위한 알고리즘 개발 등을 제시하였다.

Abstract

In the era of the 4th Industrial Revolution, the importance of data is intensifying, but there are many cases where it is not easy to use data due to personal information protection. Although criminal justice information is expected to have various useful values such as crime prediction and prevention, scientific investigation of criminal investigations, and rationalization of sentencing, the use of criminal justice information is currently limited as a matter of legal interpretation related to privacy protection and criminal justice information. This study proposed to convert criminal justice information into ‘crime data’ and use it as big data through the structuralization and categorization of criminal justice information. And when using “crime data,” legal issues, value in use, considerations for data generation and use were verified by experts, and future strategic development plans were identified. Finally we found that ‘crime data’ seems to have solved the privacy problem, but it is necessary to specify in the criminal justice information related law and it is urgent to be organized in a standardized form for analysis to use big data. Future directions are to derive data elements, construct a dictionary thesaurus, define and classify personal sensitive information for data grading, and develop algorithms for shaping unstructured data.

19

ChatGPT가 자동 생성한 더블린 코어 메타데이터의 품질 평가: 국내 도서를 대상으로

김선욱(경북대학교 사회과학대학 문헌정보학과) ; 이혜경(경북대학교 문헌정보학과) ; 이용구(경북대학교) 2023, Vol.40, No.2, pp.183-209 https://doi.org/10.3743/KOSIM.2023.40.2.183

초록보기

초록

이 연구의 목적은 ChatGPT가 도서의 표지, 표제지, 판권기 데이터를 활용하여 생성한 더블린코어의 품질 평가를 통하여 ChatGPT의 메타데이터의 생성 능력과 그 가능성을 확인하는 데 있다. 이를 위하여 90건의 도서의 표지, 표제지와 판권기 데이터를 수집하여 ChatGPT에 입력하고 더블린 코어를 생성하게 하였으며, 산출물에 대해 완전성과 정확성 척도로 성능을 파악하였다. 그 결과, 전체 데이터에 있어 완전성은 0.87, 정확성은 0.71로 준수한 수준이었다. 요소별로 성능을 보면 Title, Creator, Publisher, Date, Identifier, Right, Language 요소가 다른 요소에 비해 상대적으로 높은 성능을 보였다. Subject와 Description 요소는 완전성과 정확성에 대해 다소 낮은 성능을 보였으나, 이들 요소에서 ChatGPT의 장점으로 알려진 생성 능력을 확인할 수 있었다. 한편, DDC 주류인 사회과학과 기술과학 분야에서 Contributor 요소의 정확성이 다소 낮았는데, 이는 ChatGPT의 책임표시사항 추출 오류 및 데이터 자체에서 메타데이터 요소용 서지 기술 내용의 누락, ChatGPT가 지닌 영어 위주의 학습데이터 구성 등에 따른 것으로 판단하였다.

Abstract

The purpose of this study is to evaluate the Dublin Core metadata generated by ChatGPT using book covers, title pages, and colophons from a collection of books. To achieve this, we collected book covers, title pages, and colophons from 90 books and inputted them into ChatGPT to generate Dublin Core metadata. The performance was evaluated in terms of completeness and accuracy. The overall results showed a satisfactory level of completeness at 0.87 and accuracy at 0.71. Among the individual elements, Title, Creator, Publisher, Date, Identifier, Rights, and Language exhibited higher performance. Subject and Description elements showed relatively lower performance in terms of completeness and accuracy, but it confirmed the generation capability known as the inherent strength of ChatGPT. On the other hand, books in the sections of social sciences and technology of DDC showed slightly lower accuracy in the Contributor element. This was attributed to ChatGPT’s attribution extraction errors, omissions in the original bibliographic description contents for metadata, and the language composition of the training data used by ChatGPT.

20

과학기술 핵심개체 인식기술 통합에 관한 연구

최윤수(한국과학기술정보연구원) ; 정창후(한국과학기술정보연구원) ; 조현양(경기대학교) 2011, Vol.28, No.1, pp.89-104 https://doi.org/10.3743/KOSIM.2011.28.1.089

초록보기

초록

대용량 문서에서 정보를 추출하는 작업은 정보검색 분야뿐 아니라 질의응답과 요약 분야에서 매우 유용하다. 정보추출은 비정형 데이터로부터 정형화된 정보를 자동으로 추출하는 작업으로서 개체명 인식, 전문용어 인식, 대용어 참조해소, 관계 추출 작업 등으로 구성된다. 이들 각각의 기술들은 지금까지 독립적으로 연구되어왔기 때문에, 구조적으로 상이한 입출력 방식을 가지며, 하부모듈인 언어처리 엔진들은 특성에 따라 개발 환경이 매우 다양하여 통합 활용이 어렵다. 과학기술문헌의 경우 개체명과 전문용어가 혼재되어 있는 형태로 구성된 문서가 많으므로, 기존의 연구결과를 이용하여 접근한다면 결과물 통합과정의 불편함과 처리속도에 많은 제약이 따른다. 본 연구에서는 과학기술문헌을 분석하여 개체명과 전문용어를 통합 추출할 수 있는 기반 프레임워크를 개발한다. 이를 위하여, 문장자동분리, 품사태깅, 기저구인식 등과 같은 기반 언어 분석 모듈은 물론 이를 활용한 개체명 인식기, 전문용어 인식기를 개발하고 이들을 하나의 플랫폼으로 통합한 과학기술 핵심개체 인식 체계를 제안한다.

Abstract

Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

바로가기메뉴

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

초록

Abstract

정보관리학회지