Know Your Risk: Penguin Analysis | Panda Risk

SEOMoz was Right on Relevancy – Judging the Role of LDA in Increasing Organic Traffic

Disclaimer: We have no conclusive evidence that LDA or nTopic are actual Google rankings factors, and the nTopic sute clearly dtates this. However, we have statistically significant evidence that recommendations based on nTopic increase organic traffic. We are committed to the accurate, scientific representation of our findings and apologize fir any misrepresentations.

On September 6th, 2010, Rand Fishkin and Ben Hendrickson dropped a bomb on the search community – a new on-page ranking factor had potentially been discovered. The statistical method known as Latent Dirichlet Allocation (LDA), a common method of topic modeling, seemed to correlate quite well with organic search rankings in Google. In layman terms, the relevancy of a document to the particular word using this statistical method appeared to matter an awful lot, even more than unique linking domains.

Unfortunately, like most great discoveries, they tend to attract a lot of criticism (rightfully so) and the flaws of the discovery can easily escalate to becoming the story rather than the discovery itself. So was the case with LDA. A calculation error in SEOMoz’s early reporting revealed that LDA correlated at 0.17 rather than 0.32. SEOMoz detractors immediately blasted them for their prior “sensationalism” and LDA seemed to dim away from what had once seemed to be a bright future.

Divine Providence

I was immediately interested in LDA when SEOMoz announced their findings, in fact I immediately used their free tool (no longer available) to compare long-tail to short tail keywords. I had long felt that our best bet to determining Google’s ranking factors was to look to statistics, but lacked the background myself to really know where to begin. However, while introducing myself at a newcomer’s dinner party at the church my wife and I had just recently begun to attend, I mentioned this new “Latent Diriklet…” (embarrassingly, I did not know how to pronounce the word) “thing that had just come up”. The newcomer sitting next to me corrected my pronunciation. His name was Andrew Cron, a PhD candidate in Statistics at Duke University.

Build, refine, study, rinse, repeat.

Over the next year and a half, I and Virante worked with Andrew to build our own LDA model of the English language using a similar data set to that of SEOMoz – a random selection of 1,000,000 English-language Wikipedia articles. We had a working API within a few months which we then tied into our own free tools to replace that which SEOMoz was no longer updating. Using Andrew’s acumen, we were able to create an incredible architecture to accurately measure topical relevancy. However, the exciting stuff came months later.

Not Correlation, Causation.

I get it. Correlation does not imply causation. But correlation IS an important tool in the scientist’s belt. It helps us know what to look at, what to experiment with, and where we might find new answers. So we did just that, we took SEOMoz’s correlative findings, created our own model, and began experimenting.

I am proud to say that we can make the following statement without any reservations. Content Blindly Optimized with Topic Modeling Suggestions can Increase Organic Search Traffic. From this study came the birth of nTopic – a freemium API for determining the topical relevancy of a page to a keyword.

Our study [PDF] conclusively reveals that using our nTopic process to improve content relevancy of a page will, over time, increase organic traffic to the modified page. You can visit the study page or read the PDF for complete details.

“Relevance is the New PR”

James Norquay recently released an incredible interview with an ex-search quality member. While I can’t say I wholly agree with his statement that “Relevance is the new PR”, I can’t ignore any longer that Google is looking at more and more intelligent statistical methods of determining the quality of content. I believe that topical relevance is one of those methods, and nTopic is the way to measure it.

;

SEOMoz was Right on Relevancy - Judging the Role of LDA in Increasing Organic Traffic by No tags for this post.

4 Comments

  1. Steve Floyd
    Oct 19, 2012

    We are very interested in integrating nTopic into the metrics of our app. Thank you very much for doing this type of research and sharing it with the community. It is genuinely appreciated.

  2. Mike Bayes
    Oct 19, 2012

    Great logic and follow through! I can not wait to try it out. Like most things, it’s just logical that Google would be employing LDA, and we are just now figuring it out.

    It certainly explains why we see so many SEO failures when relevancy isn’t taken into account.

  3. Sean Golliher
    Oct 19, 2012

    The title of this post doesn’t match the conclusion you can make from the work presented in the paper. The person doing the research was modest, like most scientists, and correctly concluded: “We cannot conclude that topic modeling or content relevancy is used in Google’s algorithm”. We already know that Google uses advanced statistical techniques to calculate relevancy, machine learning, etc. They write plenty of papers on these topics. So nothing new was discovered with regards to algorithms by seos. Seomoz did a poor analysis and applied statistical techniques incorrectly. That was what the arguments where about. It was pointed out they couldn’t conclude anything from their data. Not sure how seomoz is relevant to the work done here other than for attention. We also know that Google works on projects like this: http://code.google.com/p/plda/.

    In one experiment ~40% of the randomly inserted search phrases caused increases in organic search traffic. In the other cohort where “ntopic” was applied 40% of the pages had negative or no results. If one applies this to a particular page you aren’t able to predict the outcome. The approach of using a Poisson distribution to model visits is an interesting approach and so is the modeling. The modeling that was done was based on 200 pages. You have a probability to calculate.

    What we need less of is big titles from marketing companies making claims that their research, done by competent scientists, doesn’t support.

  4. admin
    Oct 20, 2012

    You are correct. My intent was to introduce it as a potential ranking factor. The studies conclude that Using LDA to identify keyword potentials and increase relevance can increase organc search traffic beyond a control. This is statistically significant and backed by the research. I am updating the titile accordingly. Thanks again for your comments.

Trackbacks/Pingbacks

  1. SearchCap: The Day In Search, October 19, 2012 - [...] SEOMoz was Right on Relevancy – The Birth of nTopic, the LDA Google Search Ranking Factor, The Google Cache …
  2. SearchCap: The Day In Search, October 19, 2012 | Search Engine Marketing & Website Optimization - [...] SEOMoz was Right on Relevancy – The Birth of nTopic, the LDA Google Search Ranking Factor, The Google Cache …
  3. Thoughts on LDA Studies & Selling Out Science | SeanGolliher.com - [...] title of this blog post, pointed out to me a few days ago, is referring to data that was …
  4. Forget Everything You Think You Know About SEO : @ProBlogger - [...] not to say that Google hasn’t developed more advanced algorithms to analyse content on a page, but certainly the …
  5. Forget Everything You Think You Know About SEO | iblogwp.net - [...] not to say that Google hasn’t developed more advanced algorithms to analyse content on a page, but certainly the …
  6. Forget Everything You Think You Know About SEO | RMKz Affiliates - [...] not to say that Google hasn’t developed more advanced algorithms to analyse content on a page, but certainly the …
  7. Forget Everything You Think You Know About SEO - Paper Chase - [...] not to say that Google hasn’t developed more advanced algorithms to analyse content on a page, but certainly the …
  8. Forget Everything You Think You Know About SEO - [...] not to say that Google hasn’t developed more advanced algorithms to analyse content on a page, but certainly the …

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>