Know Your Risk: Penguin Analysis | Panda Risk

W3C HTML Validation and Search Engine Optimization

It has been a while since I have posted some of Virante’s research to the blog, and a good friend and former COO Bob Misita called me out on it. I figured I would release some of the data from a recent study we did on the relationship of W3C HTML Validation and web page rankings. Because validation is quite complex, we chose to take a macro-look rather than our traditional methodology of getting individual sites into the SERPs via sitemaps and then tweaking individual independent variables.

In particular, we looked at the W3C validation of approximately 100 separate keywords in Google, Yahoo, MSN Live and Ask. For each keyword, we extracted the top 10 ranking sites, measured the number of errors via a W3C validation check, and used multiple statistical models to determine whether the individual rankings of the sites could be associated with validation error numbers.

The more rudimentary statistics are all we needed to fairly easily dismiss the assumption that validated content will perform better in the search engines – that is, in G,Y,M or A.

Graph of W3C validation and Search Rankings

The erratic nature of average # of validation errors compared to the ranking position is fairly evident from the graph above. But, rather than assume that the data from the averages of all 100 keyword searches was accurate, we decided to look at the least squares regression for each and every keyword on each engine (400 different result sets).

Engine Avg Slope Avg
Google 155 1.61369625672E-19
Yahoo 146 0.00325581395349
MSN Live 111 0.00418604651163
Ask 102 0.000714285714286

As you can see, the slope of the Least Squares Regression Line is barely positive, the largest being Yahoo’s at 3/1000. If the confidence levels were high, you could assume that for every 333 validation errors removed from your page, you could see your rankings rise by 1 point. However, the confidence levels were not sufficient and, perhaps most glaring, fewer than 2% of the sites tested had greater than 333 validation errors (meaning the vast majority of sites could not benefit from such a change).

Engine 1 2 3 4 5 6 7 8 9 10
Google 103 74 118 190 86 127 60 180 145 146
Yahoo 97 95 78 134 121 91 126 145 133 118
MSN Live 54 102 78 59 122 79 76 100 128 88
Ask 98 99 81 94 63 112 105 82 43 89

Even though validating sites appear to do better in Live and Ask than in Google and Yahoo, we can quickly counter this by looking at the aforementioned regression slopes. It is possible that W3C validation may play a role in being indexed (although I think this is unlikely). Importantly, we saw similar variation in the sites the 4 search engines allowed to rank – meaning that there appears to be no threshold score required to rank in any of these search engines.

So, there you have it. One less thing to worry about. While I still think HTML Validation is a worthy cause in-and-of-itself, one would be hard-pressed to prove that it is directly, positively correlated, much less causal, in regards to one’s search rankings.

W3C HTML Validation and Search Engine Optimization by No tags for this post.

11 Comments

  1. Anup
    Aug 1, 2008

    Good post. I agree, as much as I support the idea of web standards, when it comes to ranking in search engines I find it has little to no effect whatsoever. It is very useful for other things, but search ranking: I also very much doubt it. As a technical thing search engines are good at working around markup mistakes, and instead concentrate on “social” factors to determine ranking (e.g. number of inbound quality links etc).

    I think when “web standards” were being pushed a few years ago, people kept saying that it would help with search engine ranking. It *might* help with indexing however, but I find that it only helps as much as just being sure you have a page that is not broken then a search engine is more able to parse it.

  2. John H. Gohde
    Aug 2, 2008

    One only has to look at the 1,000+ validation errors that Amazon.com has and compare that to how much money they are making with their absolute disaster of site by any W3C Validation standard.

  3. Maqsood Ahmed
    Aug 3, 2008

    Yes, you have one more evidence to prove there is little if any corelationship between W3C validation and your search engine rankings. We have observed it countless times that you need do the basic on-page optimization, develop useful content and build lots of quality inbound links to be able to rank well in the major search engines.

  4. David S Foreman
    Aug 3, 2008

    Also interesting to think about is the possibility that sites that have many errors would tend to be sites that are more poorly developed overall with less internal linking, poorer onsite SEO factors, and more violations of SEO guidelines such as pop unders, hidden text, banned mx servers and mal ware, etc.
    Another interesting test would be page load time and its affect on rankings. Clearly Google thinks this is a quality issue for web pages and have begun working this factor into AdWords landing pages.

  5. Tom Hughes-Croucher
    Aug 3, 2008

    Validation might be a quantifiable indicator or “smell” of other aspects of code quality, such as clean markup.

    While page rank is obviously an important factor, cleaner code can dramatically increase search engine relevance by increasing the search term to page weight ratio.

  6. Gab Goldenberg
    Aug 3, 2008

    Nice to have you burst the bubble on all the often-arrogant, know-it-all, anal-retentive validation junkies.

  7. Mack
    Aug 3, 2008

    I have said it before and I will say it again, Validation is most important to meet the compliance of different browsers (human related) rather than SEO purposes. It never hurts to validate anything of course, but as far as your crawlability it has very little to do.

  8. Reilly
    Aug 4, 2008

    great post, i’ve watched this topic discussed, but no one has used any data to support their reasoning

    thnx for really putting some solid evidence out there of why wc3 validation doesn’t effect rankings

  9. Aldo Giammusso
    Aug 13, 2008

    I disagree even though your evidence and mapping supports your case, you’re only taking a few web sites and plotting them on a graph for certain key terms. I recently had an issue with not getting indexed with MSN or LIVE.com. I contacted MSN directly and they told me to validate and resubmit and it will improve.

  10. Glenn
    Aug 20, 2008

    “We are sorry, but StopScraper has determined that your IP address is associated with a scraper.”

    Ehh, ok? Normal RSS-feed usage. Default setting in app to check every hour. My IP is probably available in the admin interface of this blog. I now have changed to check every 12 hour. Please remove the block, since I find your stuff interesting. Thank you!

  11. Chat Man
    Aug 26, 2008

    I like that this post, even with measurably relevant data, still leaves the discussion open, but more focused.

    However, I consider that if a page looks bad to a visitor, then it will look bad to a bot; and vice versa. Mack asserts this point quite effectively in regards to browser compliance. If you have great rankings, but surfing your site in Opera means only using shortcut keys to navigate – the odds are that you’ll loose a sale or exposure. That’s just how it goes.

    So, validation doesn’t matter, eh? Because HTML validation was not ‘perfect’ for these sites, I’m curious to see how the script and CSS coding of these same sites perform against validation? Identical ratios would convince me that it’s not a *ranking* factor, but a *user experience* factor (and aren’t the two equally relevant to organic search?).

    Thanks for recharging the debate!

    Chat Man

Trackbacks/Pingbacks

  1. W3C HTML Validation and SEO - [...] W3C HTML Validation shows little to no impact on search rankings on Google, Yahoo, Ask or MSN Live. W3C …
  2. I Motori di Ricerca favoriscono siti W3C Compatibili ? - [...] posizionamento nei motori di ricerca ? Secondo uno studio recente che ho visto a the GoogleCache, W3C HTML Validation …
  3. HTML Validation Barely a Ranking Factor « SEO Chatter: What’s the buzz, man? - [...] clipped from www.thegooglecache.com [...]
  4. Bookmarks about W3c - [...] - bookmarked by 1 members originally found by nightcon1600 on 2008-10-19 It has been a while since I …

Submit a Comment

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>