Fast Asynchronous Anti-TrustRank forWeb Spam Detection

Joyce Whang, Yeonsung Jung, Inderjit Dhillon, Seonggoo Kang, Jungmin Lee

Abstract:   Web spam detection is an important problem in Web search. Since Web spam pages tend to have a lot of spurious links, many Web spam detection algorithms exploit the hyperlink structure between the Web pages to detect the spam pages. Anti-TrustRank algorithm is a well-known link-based spam detection algorithm which follows the principle that spam pages are likely to be referenced by other spam pages. Since a real-world Web graph involves tens of billions of nodes, it is crucial to develop work-efficient Web spam detection algorithms. In this paper, we develop asynchronous Anti-TrustRank algorithms which allow us to significantly reduce the number of arithmetic operations compared to the traditional synchronous Anti-TrustRank algorithm without degrading the performance in detecting Web spams. We theoretically prove the convergence of the asynchronous Anti-TrustRank algorithms, and conduct experiments on a real-world Web graph indexed by NAVER which is the most popular search engine in Korea.

Citation

  • Fast Asynchronous Anti-TrustRank forWeb Spam Detection
    J. Whang, Y. Jung, I. Dhillon, S. Kang, J. Lee.
    WSDM workshop on Misinformation and Misbehavior Mining on the Web (MIS2), 2018.

    Bibtex: