바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

엘리먼트 기반 XML 문서검색의 성능에 관한 실험적 연구

An Experimental Study on the Performance of Element-based XML Document Retrieval

정보관리학회지 / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2006, v.23 no.1, pp.201-219
https://doi.org/10.3743/KOSIM.2006.23.1.201
윤소영 (국사편찬위원회)
문성빈 (연세대학교)
  • 다운로드 수
  • 조회수

초록

이 연구에서는 가장 적합한 엘리먼트 기반 XML 문서검색 기법을 제시하기 위해 언어모델 검색 접근법으로 다이버전스 기법, 보정 기법 그리고 계층적 언어모델의 검색성능을 평가하는 실험을 수행하였다. 실험 결과, 가장 효율적인 검색 접근법으로 문서의 구조정보를 적용한 계층적 언어모델 검색을 제안하였다. 특히, 계층적 언어모델은 실제 검색에서 중요성을 가지는 검색순위 상위에서 뛰어난 성능을 보였다.

keywords
XML, 엘리먼트 검색, 내용기반 검색, 계층구조, 언어모델, 다이버전스, 보정, 계층적 언어모델, XML, Element Retrieval, Content-oriented Retrieval, Hierarchical Structure, Language Model, Divergence, Smoothing, Hierarchical Language Model, XML, Element Retrieval, Content-oriented Retrieval, Hierarchical Structure, Language Model, Divergence, Smoothing, Hierarchical Language Model

Abstract

This experimental study suggests an element-based XML document retrieval method that reveals highly relevant elements. The models investigated here for comparison are divergence and smoothing method, and hierarchical language model. In conclusion, the hierarchical language model proved to be most effective in element-based XML document retrieval with regard to the improved exhaustivity and harmed specificity.

keywords
XML, 엘리먼트 검색, 내용기반 검색, 계층구조, 언어모델, 다이버전스, 보정, 계층적 언어모델, XML, Element Retrieval, Content-oriented Retrieval, Hierarchical Structure, Language Model, Divergence, Smoothing, Hierarchical Language Model, XML, Element Retrieval, Content-oriented Retrieval, Hierarchical Structure, Language Model, Divergence, Smoothing, Hierarchical Language Model

참고문헌

1.

Abolhassani, M. (2004). Applying the Divergence From Randomness Approach for Content-Only Search in XML Documents. European Conference on Information Retrieval Research, 26, -.

2.

Amati, G. (2002). Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Transactions on Information Systems, 20(4), -.

3.

Chiaramella, Y. (1996). A Model for multimedia information retrieval. University of Glasgow.

4.

Hiemstra, D. (1999). Twenty-One at TREC-7: Ad-hoc and cross-language track. Text REtrieval Conference, 7, 227-238.

5.

Jelinek, F.. (1980). Interpolated estimation of Markov source parameters from sparse data (-). Pattern Recognition in Practice.

6.

McCallum, A. (1999). Text classification by bootstrapping with keywords, em and shrinkage (52-58). ACL 99 Workshop for Unsupervised Learning in Natural Language Processing.

7.

Miller, D. R. H. (1999). A hidden Markov model information retrieval system. ACM SIGIR Conference, 22, 214-221.

8.

Moffat, A. (1994). Retrieval of partial documents In D. Harman, editor, Proceedings of the Second Text REtrieval Conference (TREC-2).

9.

Ogilvie, P. (2003). Using Language Models for Flat Text Queries in XML Retrieval. In Proceedings of the Second Annual Workshop of the INitiative for the Evaluation of XML Retrieval (INEX)..

10.

Ponte, J. M.. (1998). A language modeling approach to information retrieval. In Proceedings of the 21st ACM Conference on Research and Development in Information Retrieval, , -.

11.

Salton, G. (1993). Approach to passage retrieval in full text information systems. Annual International Conference on Research and Development in Information Retrieval, 16, -.

12.

Sigurbjörnsson, B.,. (2003). An Element- based Approach to XML Retrieval (-). the third Workshop of the INitiative for the Evaluation of XML Retrieval.

13.

Singhul, A. (1996). Pivoted document length normalization. In Proceedings of the 19th Annual International ACM-SIGIR Conference on Research and Development Information Retrieval, 19, 21-29.

14.

Wilkinson, R. (1994). Effective retrieval of structured docu- ments. Proceedings of SIGIR Conference, , 311-317.

15.

Zhai, C. (2001). A study of smoothing methods for language models applied to ad hoc information retrieval. Proceedings of the ACM SIGIR Conference, 24, 334-342.

정보관리학회지