Relevancy Modified MozRank: A Smarter Metric for Rank Analysis
We have been working for a while now on our own internal correlation study in partnership with Trident Marketing and Fuzzy Logix. In working on this project, time and time again it shocked me how crude our current ranking metrics are. We finally have good raw data from sources like SEOMoz and Majestic SEO but we have only begun to scratch the surface of how Google uses these types of data to organize and create search rankings.
In the same way that SEOMoz identified a simple statistical modeling technique known as Latent Dirichlet Allocation as a likely candidate for how Google models topic relevancy, we have been looking for similar statistical techniques that Google is likely to use in turning raw link data into metrics more suitable for ranking pages. The easiest way to do this has been to look at the language of SEO’s and try and translate what we intuitively believe into statistical algorithms.
Anchor Text Relevancy
It is generally believed by SEO’s that exact match anchor text links is one of the most important ranking metrics. SEOMoz’s correlation study seems to bear this out. However, many SEO’s will go on to explain that “relevant” anchor text matters as well, and SEOMoz’s study similarly tries to back this up with “partial match” anchor text (ie: baseball card would be a partial match to either baseball team or birthday card).
You can be certain that Google’s anchor text relevancy algorithm is probably more sophisticated than a string-in-string search, so we decided to look to statistics for string comparison tools with which we can modify the mozRank passed by a link in a way that provides extra value to more relevant anchor text. We call this Relevancy Modified MozRank.
There are several statistical methods we can use to model anchor text relevancy, three of which we discuss here.
- Levenshtein Distance: The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. (via wikipedia) For example, if we were to compare the anchor text “baseball” to “baseball card”, the Levenshtein Distance would be 5 (adding a space and the 4 letters c,a,r and d). The Levenshtein Distance between “Baseball” and “Baseball” would be 0. This is also called the Edit Distance
- Jaro Winkler Distance: The higher the Jaroâ€“Winkler distance for two strings is, the more similar the strings are. The Jaroâ€“Winkler distance metric is designed and best suited for short strings such as person names. The score is normalized such that 0 equates to no similarity and 1 is an exact match. (via wikipedia
- Smith Waterman Algorithm: The Smithâ€“Waterman algorithm is a well-known algorithm for performing local sequence alignment; that is, for determining similar regions between two nucleotide or protein sequences. Instead of looking at the total sequence, the Smithâ€“Waterman algorithm compares segments of all possible lengths and optimizes the similarity measure. (via wikipedia We found this to be a unique way to find relationships between parts of strings such as word stems, tense, etc.
- Notably Not Included: Latent Semantic Analysis – working on this one, it is a little bit more complicated
Computing Relevancy Modified MozRank
Since the three measurements above render different scores on different scales, we have to compute them differently. First, we compute the Levenshtein Distance, Jaro Winkler Distance, or Smith Waterman score for the ranking keyword and the anchor text used.
- Levenshtein Distance modified MozRank (LDmmR): We use a simple measurement of raw mozRank divided by Levenshtein Distance +1 (rmR/(LD+1))
- Jaro Winkler Distance modified MozRank (JWDmmR): We use a simple measurement of raw mozRank multipled by the Jaro Winkler Distance (rmR*JWD)
- Smith Waterman modified MozRank (SWAmmR): We use a simple measurement of raw mozRank multipled by the Smith Waterman score (rmR*SWA)
In the above picture (click to enlarge) you can see the LD, JWD, and SWA modified mozRanks of various pages on the right hand side. Notice that we have no external exact match anchor text to work with in the left columns, but Google has plenty of relevancy data to work with by using these kinds of anchor text relevance measurements.
We have long thought that Google is using the relevancy of anchor text in determining how much link juice to pass. You don’t need the exact anchor text to look relevant to Google – in fact you don’t need any at all. This does not mean exact match links don’t help, it merely means that your strategy shouldn’t rely solely upon it. We will keep you all updated as we find more sophisticated measures and start to compare the correlation of these types of modified mozRanks to actually ranking.