Clustered Embedding of Massive Social Networks

Han Song, Berkant Savas, Tae Cho, Vacha Dave, Zhengdong Lu, Inderjit Dhillon, Yin Zhang, Lili Qiu

Abstract:   The explosive growth of social networks has created numerous exciting research opportunities. A central concept in the analysis of social networks is a proximity measure, which captures the closeness or similarity between nodes in the network. Despite much research on proximity measures, there is a lack of techniques to efciently and accurately compute proximity measures for largescale social networks. In this paper, we embed the original massive social graph into a much smaller graph, using a novel dimensionality reduction technique termed Clustered Spectral Graph Embedding. We showthat the embedded graph captures the essential clustering and spectral structure of the original graph and allow a wide range of analysis to be performed on massive social graphs. Applying the clustered embedding to proximity measurement of social networks, we develop accurate, scalable, and exible solutions to three important social network analysis tasks: proximity estimation, missing link inference, and link prediction. We demonstrate the effectiveness of our solutions to the tasks in the context of large real-world social network datasets: Flickr, LiveJournal, and My- Space with up to 2 million nodes and 90 million links.

Download: pdf


  • Clustered Embedding of Massive Social Networks (pdf, software)
    H. Song, B. Savas, T. Cho, V. Dave, Z. Lu, I. Dhillon, Y. Zhang, L. Qui.
    In ACM SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS/Performance), pp. 331-342, June 2012.