Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned

Joyce Whang, Andrew Lenharth, Inderjit Dhillon, Keshav Pingali

Abstract:   Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve more than 28x the performance of standard PageRank implementations (e.g., those in GraphLab). The design choices affect both single-threaded performance as well as parallel scalability. The implementation lessons not only guide efficient implementations of many graph mining algorithms, but also provide a framework for designing new scalable algorithms.

Download: pdf

Citation

  • Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned (pdf, software)
    J. Whang, A. Lenharth, I. Dhillon, K. Pingali.
    In International European Conference on Parallel and Distributed Computing (Euro-Par), pp. 438–450, August 2015. (Oral)

    Bibtex: