Relevancy Modified MozRank: A Smarter Metric for Rank Analysis
We have been working for a while now on our own internal correlation study in partnership with Trident Marketing and Fuzzy Logix. In working on this project, time and time again it shocked me how crude our current ranking metrics are. We finally have good raw data from sources like SEOMoz and Majestic SEO but we have only begun to scratch the surface of how Google uses these types of data to organize and create search rankings.
In the same way that SEOMoz identified a simple statistical modeling technique known as Latent Dirichlet Allocation as a likely candidate for how Google models topic relevancy, we have been looking for similar statistical techniques that Google is likely to use in turning raw link data into metrics more suitable for ranking pages. The easiest way to do this has been to look at the language of SEO’s and try and translate what we intuitively believe into statistical algorithms.
Anchor Text Relevancy
It is generally believed by SEO’s that exact match anchor text links is one of the most important ranking metrics. SEOMoz’s correlation study seems to bear this out. However, many SEO’s will go on to explain that “relevant” anchor text matters as well, and SEOMoz’s study similarly tries to back this up with “partial match” anchor text (ie: baseball card would be a partial match to either baseball team or birthday card).
You can be certain that Google’s anchor text relevancy algorithm is probably more sophisticated than a string-in-string search, so we decided to look to statistics for string comparison tools with which we can modify the mozRank passed by a link in a way that provides extra value to more relevant anchor text. We call this Relevancy Modified MozRank.
The Contenders
There are several statistical methods we can use to model anchor text relevancy, three of which we discuss here.
- Levenshtein Distance: The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (via wikipedia) For example, if we were to compare the anchor text “baseball” to “baseball card”, the Levenshtein Distance would be 5 (adding a space and the 4 letters c,a,r and d). The Levenshtein Distance between “Baseball” and “Baseball” would be 0. This is also called the Edit Distance
- Jaro Winkler Distance: The higher the Jaro–Winkler distance for two strings is, the more similar the strings are. The Jaro–Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match. (via wikipedia
- Smith Waterman Algorithm: The Smith–Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smith–Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. (via wikipedia We found this to be a unique way to find relationships between parts of strings such as word stems, tense, etc.
- Notably Not Included: Latent Semantic Analysis – working on this one, it is a little bit more complicated 😉
Computing Relevancy Modified MozRank
Since the three measurements above render different scores on different scales, we have to compute them differently. First, we compute the Levenshtein Distance, Jaro Winkler Distance, or Smith Waterman score for the ranking keyword and the anchor text used.
- Levenshtein Distance modified MozRank (LDmmR): We use a simple measurement of raw mozRank divided by Levenshtein Distance +1 (rmR/(LD+1))
- Jaro Winkler Distance modified MozRank (JWDmmR): We use a simple measurement of raw mozRank multipled by the Jaro Winkler Distance (rmR*JWD)
- Smith Waterman modified MozRank (SWAmmR): We use a simple measurement of raw mozRank multipled by the Smith Waterman score (rmR*SWA)
In the above picture (click to enlarge) you can see the LD, JWD, and SWA modified mozRanks of various pages on the right hand side. Notice that we have no external exact match anchor text to work with in the left columns, but Google has plenty of relevancy data to work with by using these kinds of anchor text relevance measurements.
Takeaways
We have long thought that Google is using the relevancy of anchor text in determining how much link juice to pass. You don’t need the exact anchor text to look relevant to Google – in fact you don’t need any at all. This does not mean exact match links don’t help, it merely means that your strategy shouldn’t rely solely upon it. We will keep you all updated as we find more sophisticated measures and start to compare the correlation of these types of modified mozRanks to actually ranking.
6 Comments
Trackbacks/Pingbacks
- Anchor Text Updates: Some straight forward reactions… | The Google Cache: Search Engine Marketing, SEO & PPC - [...] anchor text, right? Broad anchor text works too. You might want to take a look at this article on…
- Exponential Moving Average, Volatility Pockets & Improved Correlation Studies | The Google Cache: Search Engine Marketing, SEO & PPC - [...] This one really gets my blood going because of the opportunity it presents for better aggregate studies of rank…
An excellent study. Can you forsee these types of modified MozRank results being the new standard and being adopted by SEOMoz in their tools and APIs? That would be very useful to see.
I’d love to see correlation data for these different approaches, plus perhaps max(), min() and avg() of the stats you’re generating. The math is way beyond me, though!
Excellent! So are you going to make this tool available to use?
🙂
Great work. I’ve been a bit Leary of delving too deep into the more serious side of how we build analytic assumptions, but my formal BS/MS degrees are social science related, so I need to dig back into what I learned in the days of text books.
Read this on a Droid X walking in the pitch dark with reading glasses, so wasn’t optimal environment for thinking through take-aways.
Hey Russ,
Really interesting experiment! Hoping you can clarify one point for me…
You mention “In order to compute the Relevancy Modified MozRank of a page, we merely find all of the backlinks, get the mozRank passed of those backlinks, pass the anchor text for each link through a relevancy measurement tool, and then add them all together. “.
It sounds from that as though you are adding MozRank values (from different links) to get a final (adjusted) MozRank score for a URL. However, MozRank isn’t linear (I believe the log base is ~8.5), so you can’t add values. Did you account for that? If not it probably threw out your results, in which case I’d love to see a repeat of the experiment!
Thanks again for the good read. 🙂
Great post and I love the concept of factoring in a relevancy score. It really back my stance in that relevancy is a KEY ranking metric right now. The bottom line is that you need links from relevant sites. It is no longer enough to just get links from authority sites, they have to also be relevant!