SEOMoz was Right on Relevancy – Judging the Role of LDA in Increasing Organic Traffic
Disclaimer: We have no conclusive evidence that LDA or nTopic are actual Google rankings factors, and the nTopic sute clearly dtates this. However, we have statistically significant evidence that recommendations based on nTopic increase organic traffic. We are committed to the accurate, scientific representation of our findings and apologize fir any misrepresentations.
On September 6th, 2010, Rand Fishkin and Ben Hendrickson dropped a bomb on the search community – a new on-page ranking factor had potentially been discovered. The statistical method known as Latent Dirichlet Allocation (LDA), a common method of topic modeling, seemed to correlate quite well with organic search rankings in Google. In layman terms, the relevancy of a document to the particular word using this statistical method appeared to matter an awful lot, even more than unique linking domains.
Unfortunately, like most great discoveries, they tend to attract a lot of criticism (rightfully so) and the flaws of the discovery can easily escalate to becoming the story rather than the discovery itself. So was the case with LDA. A calculation error in SEOMoz’s early reporting revealed that LDA correlated at 0.17 rather than 0.32. SEOMoz detractors immediately blasted them for their prior “sensationalism” and LDA seemed to dim away from what had once seemed to be a bright future.
I was immediately interested in LDA when SEOMoz announced their findings, in fact I immediately used their free tool (no longer available) to compare long-tail to short tail keywords. I had long felt that our best bet to determining Google’s ranking factors was to look to statistics, but lacked the background myself to really know where to begin. However, while introducing myself at a newcomer’s dinner party at the church my wife and I had just recently begun to attend, I mentioned this new “Latent Diriklet…” (embarrassingly, I did not know how to pronounce the word) “thing that had just come up”. The newcomer sitting next to me corrected my pronunciation. His name was Andrew Cron, a PhD candidate in Statistics at Duke University.
Build, refine, study, rinse, repeat.
Over the next year and a half, I and Virante worked with Andrew to build our own LDA model of the English language using a similar data set to that of SEOMoz – a random selection of 1,000,000 English-language Wikipedia articles. We had a working API within a few months which we then tied into our own free tools to replace that which SEOMoz was no longer updating. Using Andrew’s acumen, we were able to create an incredible architecture to accurately measure topical relevancy. However, the exciting stuff came months later.
Not Correlation, Causation.
I get it. Correlation does not imply causation. But correlation IS an important tool in the scientist’s belt. It helps us know what to look at, what to experiment with, and where we might find new answers. So we did just that, we took SEOMoz’s correlative findings, created our own model, and began experimenting.
I am proud to say that we can make the following statement without any reservations. Content Blindly Optimized with Topic Modeling Suggestions can Increase Organic Search Traffic. From this study came the birth of nTopic – a freemium API for determining the topical relevancy of a page to a keyword.
Our study [PDF] conclusively reveals that using our nTopic process to improve content relevancy of a page will, over time, increase organic traffic to the modified page. You can visit the study page or read the PDF for complete details.
“Relevance is the New PR”
James Norquay recently released an incredible interview with an ex-search quality member. While I can’t say I wholly agree with his statement that “Relevance is the new PR”, I can’t ignore any longer that Google is looking at more and more intelligent statistical methods of determining the quality of content. I believe that topical relevance is one of those methods, and nTopic is the way to measure it.