바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

An Experimental Study on Topic Distillation Using Web Site Structure

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2007, v.24 no.3, pp.201-218
https://doi.org/10.3743/KOSIM.2007.24.3.201


  • Downloaded
  • Viewed

Abstract

This study proposes a topic distillation algorithm that ranks the relevant sites selected from retrieved web pages, and evaluates the performance of the algorithm. The algorithm calculates the topic score of a site using its hierarchical structure. The TREC .GOV test collection and a set of TREC-2004 queries for topic distillation task are used for the experiment. The experimental results showed the algorithm returned at least 2 relevant sites in top ten retrieval results. We performed an in-depth analysis of the relevant sites list provided by TREC-2004 to find out that the definition of topic distillation was not strictly applied in selecting relevant sites. When we re-evaluated the retrieved sites/sub-sites using the revised list of relevant sites, the performance of the proposed algorithm was improved significantly.

keywords
웹 검색, 토픽 검색, 사이트 검색, 웹 사이트 구조, 하이퍼링크, web search, topic distillation, site searching, web site structure, hyperlink

Reference

1.

(2003). 문서 내의 주제정보를 이용한 개선된 링크 분석 알고리즘. 30(2), 7-9.

2.

(1998). Improved Algorithms for Topic Distillation in a Hyperlinked Environment. , 104-111.

3.

(2002). When experts agree: Using non-affiliated Experts to rank popular topics. 20(1), 46-58.

4.

(1999). Focused Crawling: A new approach to topic-specific web resource discovery. , -.

5.

(2003). Task Descriptions: Web Track 2003. , -.

6.

(2004). Overview of the TREC-2004 Web Track. , -.

7.

(2003). Approaches to Robust and Web Retrieval. , -.

8.

(1999). Authoritative sources in a hyperlinked environment. 46(5), 604-632.

9.

(2005). Multiple sets of features for automatic genre classification of web documents. 41(5), 1263-1276.

10.

(2002). Pliers at TREC 2002. , -.

11.

(2003). University of Glasgow at the Web Track: Dynamic Application of Hyperlink Analysis using Query Scope. , -.

12.

(2007). Topic distillation via sub-site retrieval. 43(2), 445-460.

13.

(k.1976.). Relevance weighting of search terms. Journal of the American Society and Information Science. , 129-146.

14.

(2000). Experimentation as a way of life Okapi at TREC. 36(1), 95-108.

15.

(m.1994.). Okapi at TREC-3. In Proceedings of the Third Text Retrieval Conference. , 3-3.

16.

(2004). Microsoft Research Asia at Web Track and Terabyte Track of TREC 2004. , -.

17.

(2003). Web Unit Mining - Finding and Classifying Subgraphs of Web Pages. , 108-115.

18.

(2002). Experiments in Named Page Finding and Arabic Retrieval with Hummingbird SearchServer™ at TREC 2002. , -.

19.

(2003). Robust, Web and Genomic Retrieval with Hummingbird SearchServer™ at TREC 2003. , -.

20.

(2004). Microsoft Cambridge at TREC-13: Web and Hard Tracks. , -.

21.

(2003). THUIR at TREC 2003: Novelty, Robust and Web. , -.

22.

(2002). THU TREC-2002 Web Track Experiments. , -.

Journal of the Korean Society for Information Management