<bgsound src="http://images.jian2587.multiply.com/playlist/3/1/full/U2FsdGVkX192IlbpiMF8r3F2BmqRKJ,Ik7F0cyknCak=/infernal%20affairs.m3u" type="audio/mpeg">

Monday, December 01, 2008

PageRank in different contexts


Since the Internet’s inception, it has been going through an explosive growth. Its repertoire of information grows unbounded as more users join the network, bringing with them new information from diverse fields. With such voluminous amount of information, we require a means to organize them, and also to allow us to effectively find what we desire. Google, a leading search engine in the world, uses an algorithm called PageRank to rank a page’s relevance to a query based on the link structure of the web page graph. As Google’s success is a testimony to PageRank’s effectiveness, it is also fitting to ask ourselves how might PageRank can be utilized in other kinds of networks. After all, PageRank does not make any assumption about the nature of the nodes. What new insight can we possibly gain from a PageRank run on a different kind of graph, say, a social graph? In this paper, I will first explain how PageRank works, and then present a qualitative analysis of PageRank run on several different types of social graphs. Lastly, I will also provide what insights we can gather from the analysis of each type of social graph.

PageRank, as its name implies, ranks pages by assigning them scores (Wikipedia). It does so by exploiting implicit information contained in hyperlinks connecting web pages in a directed graph. While it is not obvious to us that a page’s relevance to a query has anything to do with the incoming links to that page, we do agree that page A linking to page B implies that page A’s content has some connections with page B’s. Thus if a page C has incoming links from many pages, it is reasonable to assume that these pages are content-wise connected in some ways and collectively form a domain of information. If we see each of these links as a vote, then within a domain or subgraph, the page receiving the most votes is deemed as relevant by most number of pages. We can then safely assume that, when a user queries a topic, the page with most number of votes has a high probability of being relevant to what the user intends to find. The PageRank algorithm is then a numerical means of determining the relevance of a page by examining its incoming links. When such a popular page is established as an authority on a topic by the algorithm, the pages it points to are also deemed to have high relevance, and thus the popular page’s outgoing links must also possess a greater weight in asserting the pointed pages’ relevance. The popular page’s score is passed on to the pointed pages, so to speak. As PageRank runs iteratively on these pages, each page’s score eventually converges to a stable value that reflects its relevance.

Now armed with knowledge of PageRank, we can question what might happen when PageRank is run on a social graph. A social graph is a graph representation of social connections between individuals in a particular population. Two individuals who know each other have an edge between them. Their edge may also carry a weight, perhaps to denote how intimate two individuals’ relationship is. Because social graph is undirected, that is, the edges have no associated directions, we can trivially convert each of the edges into two oppositely directed edges, so that PageRank can run on it. In a social graph, individuals who are well connected possess many incoming edges as well as outgoing edges. It is not hard to see that these individuals have very high PageRank scores. However, an individual with very few neighbors does not necessarily have very low PageRank score. If its few neighbors are those who possess high scores, then it would have high score as well. Knowing this, we now look at several qualitatively different social graphs. In the course of my analysis on several different types of graphs, I found the results rather startling. The first one is a social graph with a tree-like hierarchy. This is commonly seen in such community as employees of a company, or soldiers of different ranks. The second one, though rare, is a chain-like graph. The third one is a T shaped graph.

We first examine the simplest type of graph, the chain-like graph. An example of such a graph is shown below:


Figure 1: 3-Node Chain Graph

Most social graphs are decentralized, and appear like multiple subgraphs joined by a bridge. In what social settings might the graph depicted in figure 1 appear? If we think about it, it looks very much like an exchange network, where participants negotiate about how to share an amount of money. According to Network Exchange Theory, B has the most powerful position, while A and C have equal but less powerful positions (Easley, p. 115). A and C, in negotiating a share, do not have any other options aside from B. B, on the other hand, has a choice between A and C. B is then able to drive down A and C’s shares until one of them is willing to take the smallest share. Because only one of A and C may split an amount of money with B, they both have to compete with each other for B’s attention. Coincidentally, the converged PageRank scores for every individual reflect such a situation. A and C have equal PageRank scores (0.257), but the scores are lower than B’s (0.486). Seeing this, we may postulate that converged PageRank scores reflect an individual’s positional power in negotiating for a share with others. To see if this is the case, we look at some more examples of network exchange graphs:


Figure 2: 4-Node Chain Graph


Figure 3: 5-Node Chain Graph


A 4-Node Chain Graph is depicted in figure 2. Network Exchange Theory predicts that individuals A and D would have equal but the least negotiation power, while individuals B and C would have equal but the greatest negotiation power (Easley, p. 116). Again, the PageRank scores for each individual reflect such a situation. Individual A and D have equal but lesser scores (0.175), while individual B and C have equal but greater scores (0.325). We now look at figure 3. In this 5-Node Chain Graph, individual A and E have equal but the least PageRank scores (0.135). Both individual B and D have the greatest score (0.246). Individual C has a score (0.239) that is greater than A and E’s, but less than B and D’s. If we use these PageRank scores to predict each individual’s negotiation power, we would see that A and E are the least powerful, B and D are the most powerful, while C is in between. Again, this is just as predicted by the Network Exchange Theory (Easley, p. 116). Nevertheless, Network Exchange Theory does not work solely on just chain-like graphs. We now move on to a slightly more complicated graph, a T-shaped graph:


Figure 4: A 5-Node T-shaped graph

Network Exchange Theory predicts that individual B would have the greatest power, followed by D. A, C and E would have the least power since they do not have any other options (Easley, p. 113). Again, their PageRank scores fully reflect this situation. What is even more startling is that E’s PageRank score, despite low, is slightly better than A and C. What could this possibly mean in Network Exchange Theory’s terms? According to the theory, E should have a slight advantage over A and C, despite having only one option. If we think about it, D, despite having two options, is dealing with B. B can always choose A or C and abandon D, so D cannot threaten B very much. This indirectly confers E a slight advantage, which is reflected in its PageRank score.

Seeing how well PageRank scores correspond with predictions of Network Exchange Theory, how might we relate PageRank’s mechanics to Network Exchange Theory? PageRank views each directed edge as a vote, so to speak, and the PageRank scores are a quantitative measure of a page’s relevance. In discussing PageRank, we can think of PageRank scores as a kind of “fluid” that circulates throughout the network (Easley, p.144). We can thus view this “fluid” as negotiation power in Network Exchange Theory. The “fluid” accumulates in whichever nodes that occupy a powerful position. This position is in turn determined by a node’s connection with other nodes. Having a great number of connections means one have many options in negotiating the best split of money. To account for the situation of node C depicted in figure 3, we observe that even though individual B, C and D have the same number of edges, C’s “fluid” flows out to B and D, who in turn also direct some of C’s “fluid” to A and E respectively. Individuals A and E, on the other hand, can only direct their “fluid” to B and D. The consequence is that B and D obtain the greatest amount of “fluid”. As for the situation of node E in figure 4, this can again be explained using the notion of “fluid”. Individual B obviously gets the greatest amount of “fluid” by virtue of its many edges, where the “fluid” are contributed by A, C and D. A and C do not get much “fluid” since they each have only one edge. E, however, despite having only one edge, gets its “fluid” from D, and also a small amount from B indirectly (given through D). This explains E’s slight advantage over A and C.

Now that we are convinced that PageRank scores reflect one’s positional power, we look at another kind of social graph: a tree-like social graph. An example of such a graph is shown below:


Figure 5: Tree-like graph


The graph shown in figure 5 is slightly different from an actual tree graph in that the lower nodes have an edge to all higher nodes. To give a real world example of such a graph, we can imagine such a graph as army ranks, where individual A is the General, B and C are Lieutenants, and D, E, F and G are Privates. A directed edge from an individual P to Q means P obeys orders from Q. Earlier we have postulated that PageRank scores indicate positional power. Once again, the PageRank scores for this special form of tree graph corroborate our hypothesis. Given any pair of individuals, whoever has the higher PageRank scores, has the greater power.

From the few examples shown above, we have seen how running PageRank on a social network gives us new insights. We have also seen how well PageRank scores correspond with Network Exchange Theory predictions, and also the rankings within a military community. PageRank algorithm is thus not just a link analysis method, but also an invaluable tool to Network theorists who wish to understand social order or power structure within a network.

References:

1. “PageRank”. Wikipedia. Retrieved on May 01, 2008.

2. Easley, David and Jon Kleinberg. Networks. Cornell University: The Cornell Store, 2008.

0 Comments:

Post a Comment

<< Home

.