Abstract: Localized search engines are small-scale systems that index a particular community on the web. They offer several benefits over their large-scale counterparts in that they are relatively inexpensive to build, and can provide more precise and complete search capability over their relevant domains. One disadvantage such systems have over large-scale search engines is the lack of global PageRank values. Such information is needed to assess the value of pages in the localized search domain within the context of the web as a whole. In this paper, we present well-motivated algorithms to estimate the global PageRank values of a local domain. The algorithms are all highly scalable in that, given a local domain of size n, they use O(n) resources that include computation time, bandwidth, and storage. We test our methods across a variety of localized domains, including site-specific domains and topic-specific domains. We demonstrate that by crawling as few as n or 2n additional pages, our methods can give excellent global PageRank estimates.
Download: pdf
Citation
- Estimating the Global PageRank of Web Communities (pdf, software)
J. Davis, I. Dhillon.
In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 116-125, August 2006.
Bibtex: