Rescaled PageRank is a time-balanced metric built on the classical Google's PageRank metric. In this first post, we briefly review the PageRank algorithm and introduce the main idea behind the novel metric.
PageRank is a network-based algorithm which aims to rank the network's nodes according to their importance, or centrality, in the network. The main assumption of the algorithm is that
“a node is important if it is pointed to by other important nodes.”
In the case of the citation network of scientific papers, this thesis can be rephrased as
“a paper is important if it is cited by other important papers.”
PageRank then uses an equation which directly reflects this idea, and through which each paper's score is recursively propagated to its references. Papers that received many citations from influential papers with few references are thus considered influential as well. Originally devised by Brin and Page to rank web pages in the World Wide Web , the PageRank algorithm and its variants have been applied to many real-world networks . Technical details on the calculation of the PageRank scores can be found in our second blog post.
However, the algorithm has a fundamental shortcoming when applied to citation networks of scientific papers. Papers can only cite older papers, which results in a strong bias of the algorithm towards old papers. As a consequence, it is virtually impossible for a recent paper to score well by PageRank.
To solve the time bias of PageRank, we introduced the rescaled PageRank metric which explicitly suppresses the temporal bias. The scoring algorithm consists of two steps:
The rescaled PageRank score Ri(p) of a given paper i is defined as the number of standard deviations paper i outperforms with respect to papers of similar age. The corresponding formula is
where μi(p) is the average PageRank score of the papers published at a similar time as paper i and σi(p) is the standard deviation of these scores. Technical details on the calculation of the rescaled PageRank scores can be found here.
To better understand the meaning of the Ri(p) score, consider the three following examples:
We performed statistical tests  to show that the ranking by Ri(p) is not biased by age. We showed that rescaled PageRank allows us to discover highly-influential papers much earlier than PageRank and significantly better than metrics only based on citation count, which suggests that rescaled PageRank is a better proxy for paper significance than other metrics. Some of these results can be also found in our second blog post. Besides scientific papers, rescaled PageRank can be applied to any other system that can be effectively described with a directed network. If, similarly to scientific papers, there is a strong time preference in this network, scores obtained with rescaled PageRank are likely to be superior to those obtained with PageRank alone.
 S. Brin, L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems 30, 107-117 (1998)
 D. F. Gleich, Pagerank beyond the web, SIAM Review 57, 321-363 (2015)
 M. S. Mariani, M. Medo, Y.-C. Zhang, Identification of milestone papers through time-balanced network centrality, Journal of Informetrics 10, 1207–1223 (2016)