The PageRank Citation Ranking:

Bringing Order to the Web


Sergey Brin and Larry Page


Stanford Digital Libraries Working Paper

January 29, 1998


http://ilpubs.stanford.edu:8090/422/1/1999-66.pdf





PageRank is a link analysis algorithm, named after Larry Page and used by the Google web search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set.



Definition

Let E(u) be some vector over the Web pages that corresponds to a source of rank. Then, the PageRank of a set of Web pages is an assignment, R', to the Web pages which satisfies



such that c is maximized and ||R'||1 = 1.

(||R'||1 denotes the L1 norm of R').


In matrix notation, we have



Since ||R'||1 = 1, we can rewrite this as



where 1 is the vector consisting of all ones. So, R' is an eigenvector of (A + E × 1) with eigenvalue c. In fact, we want the dominant eigenvector of (A + E × 1). It may be computed by repeatedly applying (A + E × 1) to any non-degenerate start vector R'0.



Random Surfer Model

The definition of PageRank above has another intuitive basis in random walks on graphs. The simplified version corresponds to the standing probability distribution of a random walk on the graph of the Web. Intuitively, this can be thought of as modeling the behavior of a "random surfer". The "random surfer" simply keeps clicking on successive links at random.


Random Walk with Restart (RWR)

(ref. http://pike.psu.edu/publications/jcdl07-dbrank.pdf)



  • Random Walk with probability 1-α
     - Moves one web-page to another through the outedges.
  • Restart with probability α
     - Used as a source of rank to make up for the rank sinks such as cycles with no outedges.



Computing PageRank

Note that the d factor increases the rate of convergence and maintains ||R||1. An alternative normalization is to multiply R by the appropriate factor. The use of d may have a small impact on the influence of E.

(Because A is a huge size of sparse matrix, the L1-norm of Ri+1 become smaller than the L1-norm of the prior vector, Ri, after the matrix calculation. So, d > 0.)




Convergence of PageRank calculation

Defined in Markov Chain



Personalized PageRank

Concept of recommendation



References