Digg.com Duplicate Content Issue…

While webmasters around the world still remain frustrated at Google’s inability to understand that www.site.com and site.com are by-and-large, identical, it appears that Digg.com never got the memo. Luckily, this helps us put into perspective exactly how out of hand it can get.

Right now, digg.com is showing about 1.5 million duplicate content pages due to not establishing a site-wide redirect from www.digg.com/{anything} to digg.com/{anything}

Google cached for digg.com:
6.69 million
http://www.google.com/search?q=site%3Adigg.com

Google cached for www.digg.com:
1.42 million
http://www.google.com/search?q=site%3Awww.digg.com

Duplication Example:

Digg’s Privacy Page
http://www.google.com/search?q=site:digg.com+inurl:digg.com/privacy&hl=en&lr=&filter=0

While ultimately I still hold that this is Google’s problem, not digg.coms, the solution for a rewrite is fairly simple using mod-rewrite and an htaccess file…


RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^digg\.com [NC]
RewriteRule ^(.*) http://digg.com/$1 [L,R=301]

This will cause all website visitors, including a bot, to be 301 (permanent) redirected to the correct, canonical form of your url. Of course, in the example above, you will need to replace digg.com with your own domain name for it to work on your site.

Unfortunately, until Google clears up this problem Google will never clean up this problem, so adding this type of mod-rewrite is a standard of SEO (search engine optimization) for years to come.


No tags for this post.						
				

3 Comments

  1. Sam
    Aug 28, 2006

    “www” has become so common on the web that its almost invisible. No one calls it the World Wide Web anymore, not since 1994 anyway.

    I prefer this .htaccess solution for removal of www:

    RewriteEngine on
    RewriteBase /

    RewriteCond %{HTTP_HOST} ^www\.digg\.com$
    RewriteRule (.*) http://digg.com/$1 [R=Permanent]

  2. LGR
    Mar 12, 2007

    Whether Digg does a redirect or not is really not a concern. The bigger problem IMHO is the fact that there are that many pages of little to no information clogging up real search results. Digg stories give little to no real information about the topic, and the comments generally are uninspiring and rarely worth even reading. The real information the search results should be returning are the sources Digg points to with the Digg information buried back on page 100 or the SERP.

  3. yza
    Sep 3, 2007

    Thank you for sharing all this wonderful information 🙂

Submit a Comment

Your email address will not be published. Required fields are marked *