바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model

정보관리학회지 / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2014, v.31 no.1, pp.231-250
https://doi.org/10.3743/KOSIM.2014.31.1.231
허고은 (연세대학교)
송민 (연세대학교)
  • 다운로드 수
  • 조회수

초록

정보통신기술의 발달로 학술 정보의 양이 기하급수적으로 증가하였고 방대한 양의 텍스트 데이터를 처리하기 위한 자동화된 텍스트 처리의 필요성이 대두되었다. 생의학 문헌에서 생물학적 의미와 치료 효과 등에 대한 정보를 발견해내는 바이오 텍스트 마이닝은 문헌 내의 각 개념들 간의 유의미한 연관성을 발견하여 의학 영역에서 상당한 시간과 비용을 줄여준다. 문헌 기반 발견 연구로 새로운 생의학적 가설들이 발견되었지만 기존의 연구들은 반자동화된 기법으로 전문가의 개입이 필수적이며 원인과 결과의 한가지의 관계만을 밝히는 제한점이 있다. 따라서 본 연구에서는 중간 개념인 B를 다수준으로 확장하여 다양한 관계성을 동시출현 개체와 동사 추출을 통해 확인한다. 그래프 기반의 경로 추론을 통해 각 노드 사이의 관계성을 체계적으로 분석하여 규명할 수 있었으며 새로운 방법론적 시도를 통해 기존에 밝혀지지 않았던 새로운 가설 제시의 가능성을 기대할 수 있다.

keywords
biotext mining, literature based discovery, undiscovered public knowledge, graph model, 바이오 텍스트 마이닝, 문헌 기반 발견, 미발견 공공 지식, 그래프 모델

Abstract

Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus, this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

keywords
biotext mining, literature based discovery, undiscovered public knowledge, graph model, 바이오 텍스트 마이닝, 문헌 기반 발견, 미발견 공공 지식, 그래프 모델

참고문헌

1.

(2013). Automatic Classification for English Verbs. http://www.cl.cam.ac.uk/~ls418/resource_release/.

2.

Cameron, D.. (2013). A graph-based recovery and decomposition of swanson’s hypothesis using semantic predications. Journal of Biomedical Informatics, 46(2), 238-251.

3.

DiGiacomo, R. A.. (1989). Fish oil dietary supplementation in patients with Raynaud’s phenomenon : A doubleblind, controlled, prospective study. American Journal of Medicine, 8, 158-164.

4.

Frijters, R.. (2008). CoPub: a literature-based keyword enrichment tool for microarray data analysis. Nucleic Acids Research, 36(suppl 2), W406-W410.

5.

Frijters, R.. (2010). Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Computational Biology, 6(9), 1-11.

6.

Hristovski, D.. (2006). Exploiting semantic relations for literature-based discovery (349-353). In AMIA Annual Symposium Proceedings. American Medical Informatics Association.

7.

Hristovski, D.. (2005). Using literature-based discovery to identify disease candidate genes. International Journal of Medical Informatics, 74(2), 289-298.

8.

Hristovski, D.. (2013). Using literature-based discovery to identify novel therapeutic approaches. Cardiovascular and Hematological Agents in Medicinal Chemistry, 11(1), 14-24.

9.

Kilicoglu, H.. (2012). SemMedDB : a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158-3160.

10.

Kim, J. D.. (2003). GENIA corpus-a semantically annotated corpus for bio-textmining. Bioinformatics, 19(1), 180-182.

11.

Lafferty, J.. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data (282-289). In International Conference on Machine Learning.

12.

Liekens, A. M.. (2011). BioGraph: unsupervised biomedical knowledge discovery via automated hypothesis generation. Genome Biology, 12(6), R57-.

13.

(2013). LingPipe: Named entity tutorial. http://alias-i.com/lingpipe/demos/tutorial/ne/read-me.html/.

14.

(2013). LingPipe: Sentence boundary detection. http://alias-i.com/lingpipe/demos/tutorial/sentences/read-me.html/.

15.

MEDLINE. (2013). PubMed XML element descriptions and their attributes. http://www.nlm.nih.gov/bsd/licensee/elements_descriptions.html/.

16.

Narayanasamy, V.. (2004). TransMiner : Mining transitive associations among biological objects from text. Journal of Biomedical Science, 11(6), 864-873.

17.
19.

Smalheiser, N. R.. (1994). Assessing a gap in the biomedical literature : Magnesium deficiency and neurologic disease. Neuroscience Research Communications, 15(1), 1-9.

20.

Smalheiser, N. R.. (1996). Indomethacin and Alzheimer's disease. Neurology, 46(2), 583-583.

21.

Smalheiser, N. R.. (1996). Linking estrogen to Alzheimer's disease : An informatics approach. Neurology, 47(3), 809-810.

22.

Srinivasan, P.. (2004). Text mining : Generating hypotheses from MEDLINE. Journal of the American Society for Information Science and Technology, 55(5), 396-413.

23.

Sun, L.. (2009). Improving verb clustering with automatically acquired selectional preferences (638-647). In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics.

24.

Swanson, D. R.. (1986). Undiscovered public knowledge. The Library Quarterly, 56(2), 103-118.

25.

Swanson, D. R.. (1986). Fish oil, Raynaud's syndrome, and undiscovered public knowledge. Perspectives in Biology and Medicine, 30(1), 7-18.

26.

Swanson, D. R.. (1988). Migraine and magnesium : Eleven neglected connections. Perspectives in Biology and Medicine, 31(4), 526-557.

27.

Swanson, D. R.. (1990). Somatomedin C and arginine : Implicit connections between mutually isolated literatures. Perspectives in Biology and Medicine, 33(2), 157-186.

28.

Swanson, D. R.. (1997). An interactive system for finding complementary literatures : A stimulus to scientific discovery. Artificial Intelligence, 91(2), 183-203.

29.

Swanson, D. R.. (2001). Information discovery from complementary literatures : Categorizing viruses as potential weapons. Journal of the American Society for Information Science and Technology, 52(10), 797-812.

30.

Swanson, D. R.. (2006). Ranking indirect connections in literature-based discovery : The role of medical subject headings. Journal of the American Society for Information Science and Technology, 57(11), 1427-1439.

31.

(2013). UMLS Reference Manual. http://www.ncbi.nlm.nih.gov/books/NBK9676/.

32.

Weeber, M.. (2001). Using concepts in literaturebased discovery : Simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries. Journal of the American Society for Information Science and Technology, 52(7), 548-557.

33.

Weeber, M.. (2003). Generating hypotheses by discovering implicit associations in the literature : a case report of a search for new potential therapeutic uses for thalidomide. Journal of the American Medical Informatics Association, 10(3), 252-259.

34.

Wilkowski, B.. (2011). Discovery browsing with semantic predications and graph theory (-). In AMIA Annual Symposium Proceedings.

정보관리학회지