Abstract: Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve more than 28x the performance of standard PageRank implementations (e.g., those in GraphLab). The design choices affect both single-threaded performance as well as parallel scalability. The implementation lessons not only guide efficient implementations of many graph mining algorithms, but also provide a framework for designing new scalable algorithms.
Download: pdf
Citation
- Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned (pdf, software)
J. Whang, A. Lenharth, I. Dhillon, K. Pingali.
In International European Conference on Parallel and Distributed Computing (Euro-Par), pp. 438–450, August 2015. (Oral)
Bibtex: