Strong Correlation between Facebook Likes and PageRank
First, let me say that everyone should take this study with a huge grain of salt. While I believe the data is intriguing, it does not implicate anything specifically. So, here goes.
I have long guffawed at the social graph and, in particular, it’s relationship to search engine optimization. I am quick to argue about anything that would imply that Google search results are meaningfully influenced by social activities. One of my most common points is that in the majority of open social websites, the social graph is closely patterned by the link graph.
Take Digg for example. If you submit a story on Digg, it gets a link from your profile. If someone votes on that story, it receives a link from their vote history page. As more and more votes are tallied, more and more links arrive. As the story moves up the upcoming section, it gets more powerful links from pages that are closer to the homepage. If it hits the homepage, it gets that coveted high PR link. On a site like digg, there is a clear correlation between social events and link creation.
The same tends to be true on closed social networks like Facebook. Liking a Facebook Fan page may trigger a wall post or a link from a list of those organizations you like. However, because these events are behind a login, the link graph is severed. Ostensibly, all that Googlebot would be able to access would be the Facebook login page and the common endpoints of Fan Pages, People Pages, and the site directory. While the site directory would allow link juice to flow through to Fan pages, because it lists every page, you would expect it not to flow juice in a pattern that is influenced by the social activity on the site.
I was interested, however, in whether or not there was a correlation between Facebook Likes and mozRank. Potentially, well-liked pages would attract external links or links from the “recently liked” section of People pages. I spidered 1000 Facebook Fan pages and compared the number of “Likes” to the mozRank, external mozRank, internal mozRank and the unique inbound linking domains. Unsurprisingly, there was little to no correlation between Likes and any of the others. But then my tin-foil hat took over. (One point worth noting, when comparing to mozRank, external mozRank and internal mozRank, I took the Log of the Likes number. I also removed PR0 elements from the analysis. It is impossible to distinguish a PR0 that is so because too few links from a PR0 that just hasn’t been updated by Google yet. Since all pages start as PR0 in the toolbar, there is too much noise at the PR0 level to comfortably analyze)
Google has long been suspected of using alternative tactics to getting access to data it wants without using Googlebot. Speculation constantly surrounds questions of whether Google Chrome, the Google Toolbar, Google Analytics or other Google tools are used to build out the link graph without necessarily indexing or displaying content. So I decided to determine if PageRank correlated with the log of Facebook Fan Likes.
A simple linear regression in Excel reveals what appears to be a direct, positive correlation between the log of Facebook Fan Likes and the PageRank of those fan pages, while SEOMoz mozRank, in comparison, shows a scattershot of mozRanks (what we would expect). Of course, linear regression is not the appropriate model, and the R-squared measurement wouldn’t give us a reasonable statistic as the stair-step PageRank model is distinct rather than continuous data.
After consulting some individuals in the statistics community who know far more about this stuff than I do, I was pointed towards the Spearman’s Correlation Coefficient. The Correlation Coefficient for mozRank vs the Log of Facebook Fan Likes is -.103, both wrong in the direction and decidedly low. With PageRank the Correlation Coefficient was .53 with one and two tailed P at less than .00001!
So What Does this Mean?
There are a couple of potentials reasons for this strong correlation.
- Random Happenstance. The limited spiderably link graph on facebook.com sufficiently explains the correlation with PageRank. But why then does mozRank stray so drastically? How does a fan page like Peak Fan Club get a PR6 with only a handful of internal links and a mozRank of .75? And, more importantly, how does this happen over and over again?
- Google Gets Special Crawling Privileges: Rand Fishkin from SEOMoz pointed out this likely scenario. Perhaps Facebook allows Google to spider-but-not-index behind the login.
- Google Gets Link Data from Google Toolbar, Chrome, or ISP data: One could expect the link graph generated by Facebook Likes to line up quite nicely with PageRank behind-the-scenes.
- Google Uses Like Data as a PR Corollary: I think this is unlikely, but it could be a cheap short-cut.
It is a tough pill for me to swallow, but ladies and gentlemen, I am going to say it here and in writing for the first time. It may, OH GOD IT PAINS ME TO SAY IT, be beneficial, OH THE MINDLESS SUFFERING, for SEO purposes, MUST.KEEP.CONTROL, to create a Facebook fan page, put a link on it, and attract Facebook likes. There, I said it.