<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Google Cache: Search Engine Marketing, SEO &#38; PPC</title>
	<atom:link href="http://www.thegooglecache.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.thegooglecache.com</link>
	<description>SEO Research &#38; Ramblings</description>
	<lastBuildDate>Mon, 06 May 2013 17:03:37 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>You&#8217;re Still Penalized and It&#8217;s Your Own Damn Fault</title>
		<link>http://www.thegooglecache.com/white-hat-seo/common-mistakes-in-getting-out-of-penaltie/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/common-mistakes-in-getting-out-of-penaltie/#comments</comments>
		<pubDate>Mon, 06 May 2013 17:03:37 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=789</guid>
		<description><![CDATA[Because Virante owns Remove, &#8216;em a blended tool and service that helps webmasters remove bad links pointing to their websites, we have had a unique vantage point in the post-Penguin world. We actually just celebrated our 1,000,000th link removal. Our perspective has allowed us to see hundreds of webmasters struggle through the process of reconsideration [...]]]></description>
			<content:encoded><![CDATA[<p>Because <a href='http://www.virante.org'>Virante</a> owns <a href='http://www.removeem.com'>Remove, &#8216;em</a> a blended tool and service that helps webmasters remove bad links pointing to their websites, we have had a unique vantage point in the post-Penguin world. We actually just celebrated our <a href='http://millions.removeem.com'>1,000,000th link removal</a>. Our perspective has allowed us to see hundreds of webmasters struggle through the process of reconsideration requests and penalty removals &#8211; some taking as long as 2 years &#8211; and there are a couple of things we have noticed time and time again. <span id="more-789"></span></p>
<p>By and large, the largest obstacle to getting out of a penalty tends to be the website owner him or herself. I do not mean to trivialize what webmasters go through, I often think it is incredibly unfair, but we all fall victim to a human tendency to prefer <a href='http://en.wikipedia.org/wiki/Loss_aversion'>avoiding loss</a> than acquiring gain. This &#8220;loss aversion&#8221; has set webmasters up to fail time and time again once a penalty has set in. So here are a few of the biggest mistakes webmasters make&#8230;</p>
<h2>Guessing Google&#8217;s Link Qualification Algorithm</h2>
<p>First, let me say that there are some excellent tools out there for link qualification. Cemper&#8217;s <a href='http://linkdetox.com'>Link Detox</a>, the new <a href='http://linkrisk.com'>Link Risk</a> and Remove &#8216;ems internal link qualification system all do decent jobs of identifying which links are most concerning. In fact, I can say in no uncertain terms that Remove &#8216;ems system is the least &#8220;specific&#8221; of these 3 in terms of identifying exactly which links are the worst, and instead errs on the side of removals. The reason is simple&#8230;</p>
<p><b>Missing 1 bad link can mean another round of reconsideration requests</b> </p>
<p>Webmasters regularly make the error of trying to to remove the fewest number of links possible to get back into search results. The logic behind this is sound. Webmasters don&#8217;t want to get rid of any links that still pass value. However, just like the doctor who chooses to surgically remove a generous portion of tissue around a melanoma, so should you be prepared to remove a generous number of links around those you are certain are toxic to make sure you limit the number of necessary reconsideration requests.</p>
<p>More importantly, because the removal rate percentage varies greatly, it is better to reach out to a large number of potential removal opportunities early on rather than find out your first pass only produced a 5% success rate. The last thing you want to do is go through 3 or 4 reconsideration requests because each time you haven&#8217;t quite removed enough.</p>
<h2>Pay the Piper</h2>
<p>So many in the search industry have indicated that you should not to pay webmasters to remove bad links and I could not disagree with them more. There are several excellent reasons to pay &#8220;<b>bounties</b>&#8221; for link removals&#8230;</p>
<ol>
<li>removing links takes actual effort, and a webmaster should be compensated for it</li>
<li>if you spammed the link, you should cover both their effort and their hardship</li>
<li>paying to remove links speeds up the process. Nothing motivates a removal faster than a quick buck.</li>
</ol>
<p>Just remember that each link you leave up is a potential extra 2 weeks waiting for that next reconsideration request to come in. Can your business survive that easier than paying another $15 bounty?</p>
<h2>Sitting on Your Hands</h2>
<p>This one is incredibly common and, frankly, infuriating. I can&#8217;t begin to count the number of times I have heard these words &#8220;we filed a reconsideration request and now we are just waiting.&#8221; Are you crazy? If your boat is sinking do you stop trying to empty the water out because you put in a call to the Coast Guard? Get back to work! If you used the Disavow tool, you have more work to do. Keep working on that list so the day they respond, if it is negative, you can follow up with evidence that &#8211; in good faith &#8211; you continued to work and you now have a new set of link removals to show them. If you aren&#8217;t prepared with a quick response, it is evidence that you are trying to get by doing as little as possible. </p>
<h2>So, time to fight.</h2>
<p>There is a light at the end of the tunnel, but you aren&#8217;t going to get there by complaining or waiting. Grow up, accept what things you have done or have been done in your name, own up to them, clean the mess and get back to business. It is hard as hell, no doubt, don&#8217;t think for a second you have no control. </p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/common-mistakes-in-getting-out-of-penaltie/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>I am an SEO.</title>
		<link>http://www.thegooglecache.com/white-hat-seo/i-am-an-seo/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/i-am-an-seo/#comments</comments>
		<pubDate>Fri, 03 May 2013 18:04:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Rants & Raves]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=786</guid>
		<description><![CDATA[However, what I do mean to say, with great emphasis, is that SEO as a specialty has, does, will and should continue to exist. I felt like I should chime in regarding Rand&#8217;s excellent Whiteboard Friday today. 1. SEO is Bigger than SEO: SEO or &#8220;Search Engine Optimization&#8221; is is a statement of purpose, not [...]]]></description>
			<content:encoded><![CDATA[<blockquote><p>However, what I do mean to say, with great emphasis, is that SEO as a specialty has, does, will and should continue to exist.</p></blockquote>
<p>I felt like I should chime in regarding Rand&#8217;s excellent <a href='http://www.seomoz.org/blog/why-we-cant-just-be-seos-anymore-whiteboard-friday'>Whiteboard Friday</a> today.</p>
<p><strong>1. SEO is Bigger than SEO:</strong><br />
SEO or &#8220;Search Engine Optimization&#8221; is is a statement of purpose, not a statement of methods. Carl Sagan once said, &#8220;If you wish to make an apple pie from scratch, you must first invent the universe.&#8221; I suppose we could call bakers Gods if this were the case, but I believe most of us would agree that to be generally false. Except for the creators of the <a href='http://www.gigiscupcakesusa.com/briercreeknorthcarolina'>bourbon pecan cupcake</a>&#8216;.  If your goal is to bake foods, you are a baker, regardless of the tactics necessary to bake those foods. Listing off CRO and UX and Branding as necessary components of a successful SEO campaign does not change the intent or purpose of that campaign &#8211; to increase qualified organic traffic from search engines. While the responsibilities have grown, the goal has not. If the goal has shifted, then yes, you should change your title. If tomorrow, Google changes all the necessary tactics to rank, I will still be an SEO. But would I still be an Inbound Marketer?<span id="more-786"></span></p>
<p><strong>2. Perception is Bad:</strong><br />
I agree that the perception is generally bad, but this obstacle does not change what we do nor should we surrender our true identities to a new facade because we can&#8217;t shake the vestiges of our past. Using terms like &#8220;Inbound Marketing&#8221; and &#8220;Online Marketer&#8221; merely obfuscate the reality of our primary goals in the hopes that we aren&#8217;t lumped together with people&#8217;s misguided perceptions. </p>
<p><strong>3. We are Selling Ourselves Short:</strong><br />
I believe that Rand&#8217;s arguments here are what I find most disconcerting. SEO is getting harder and Google intends to make it harder and harder. When we begin to call ourselves inbound marketers and begin devote more and more of our time to ancillary marketing tactics that have tangential impacts on organic search, we have moved our eyes off the ball. We have lost focus. Now, I will admit, that focus is a luxury that only some businesses can afford &#8211; those that can afford a team of individuals who can have multiple marketers working on a cohesive strategy but with individuals focusing on different aspects. </p>
<p><strong>Final thoughts.</strong><br />
I do not mean to say that people who are inbound marketers out there should not call themselves inbound marketers. They absolutely should. I also do not mean to say that a group of SEOs are shifting away from SEO to inbound marketing. They absolutely are. However, what I do mean to say, with great emphasis, is that SEO as a specialty has, does, will and should continue to exist. That specialty will likely become more difficult to master, more costly to perform, and will require more skills than before &#8211; but I will gladly be a part of it. </p>
<p>I have a purpose. It is to increase qualified organic traffic from search engines. I embody that purpose. <b>I am an SEO</b>.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/i-am-an-seo/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Biomagnification, Redirects and Back Link Penalties</title>
		<link>http://www.thegooglecache.com/white-hat-seo/biomagnification-and-back-link-penalties/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/biomagnification-and-back-link-penalties/#comments</comments>
		<pubDate>Wed, 03 Apr 2013 13:27:02 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Advanced]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=778</guid>
		<description><![CDATA[I often find that the best sources of analysis in SEO, which is still a nascent industry, come from other academic pursuits. While these are regularly computer sciences (like latent dirichlet allocation) or mathematics (like volatility analysis), we sometimes find interesting lessons outside of those usual suspects &#8211; in this case, biology. Biomagnification is a [...]]]></description>
			<content:encoded><![CDATA[<p>I often find that the best sources of analysis in SEO, which is still a nascent industry, come from other academic pursuits. While these are regularly computer sciences (like <a href='http://www.thegooglecache.com/white-hat-seo/latent-dirichlet-allocation-lda-correlations-clarified/'>latent dirichlet allocation</a>) or mathematics (like <a href='http://www.thegooglecache.com/white-hat-seo/exponential-moving-average-volatility-pockets-improved-correlation-studies/'>volatility analysis</a>), we sometimes find interesting lessons outside of those usual suspects &#8211; in this case, biology.<span id="more-778"></span></p>
<p><img src='http://toxics.usgs.gov/photo_gallery/photos/benthic_flux/biomagnification_lg.gif' style='float:left;margin:8px'><a href='http://toxics.usgs.gov/definitions/biomagnification.html'>Biomagnification</a> is a fairly simple principle that through a series of prey-predator relationships, toxic substances tend to accumulate in higher percentages among organisms higher in the food chain. You can see a visualization of this in the image to the left. As mercury accumulates in various organisms, predators consume those organisms and absorb those toxins. Unless the organism has a way of disposing those toxins, they can and will magnify in their accumulation. </p>
<p>This is the same when merging sites with one another. If you have many micro-sites or subdomains that have been received less-than-kosher links in the past, you may have been able to dodge some algorithm updates like Penguin simply because the sites&#8217; link profiles were too small. Your rankings might not be that great either on these sites, but you have no evidence of a link penalty. However, redirecting multiple sites to another presents a potential for biomagnification because the combined link profiles may be enough to trigger penalties, filters or algorithmic devaluations.</p>
<p>The most interesting implication is this: <b>even if your main site and all the sites/subdomains you intend to redirect to your main site appear to have no link penalties (manual or algorithmic), the combination of those sites could trigger one.</b>  </p>
<p>Of course, the simple solution is to be thoughtful when joining sites together. Determine what new link thresholds will be once you combine anchor text, root link domains, etc. to make sure you aren&#8217;t stepping over any lines. And, of course, if you are &#8211; remove the concerning links before doing the redirect to avoid the wrath of search engines.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/biomagnification-and-back-link-penalties/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>9,786,010 Reasons to Do Broken Link Building</title>
		<link>http://www.thegooglecache.com/white-hat-seo/9786010-reasons-to-do-broken-link-building/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/9786010-reasons-to-do-broken-link-building/#comments</comments>
		<pubDate>Fri, 01 Feb 2013 16:46:54 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Rants & Raves]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=766</guid>
		<description><![CDATA[I have been pretty open about my excitement regarding scalable broken link building, and many of you have read my piece The Broken Link Building Bible over at SEOMoz recently, but I wanted to share some statistics with you that I was able to draw out of Garrett French and his tool at brokenlinkbuilding.com. They [...]]]></description>
			<content:encoded><![CDATA[<p>I have been pretty open about my excitement regarding scalable broken link building, and many of you have read my piece <a href="http://www.seomoz.org/blog/the-broken-link-building-bible">The Broken Link Building Bible</a> over at SEOMoz recently, but I wanted to share some statistics with you that I was able to draw out of Garrett French and his tool at <a href="http://www.brokenlinkbuilding.com">brokenlinkbuilding.com</a>. They are pretty exciting.<span id="more-766"></span></p>
<ol>
<li>Since its inception back in October of 2012, <strong>brokenlinkbuilding.com has discovered nearly 10,000,000 broken link opportunities</strong>. This is pretty amazing. The tool, which is still not nearly as popular as it should be, has managed to discover millions of straight-forward, link building opportunities.<br />&nbsp;</li>
<li>8269448 of those backlink opportunities come from 404 pages with over 100 backlinks. This is huge in terms of conversion opportunities, because it means that for each single resource you recreate, you have an opportunity to go after hundreds of links.<br />&nbsp;</li>
<li><strong>There are over 14,000 404 pages in the system</strong> with more than 100 unique linking domains.<br />&nbsp;</li>
<li>142,088 backlink opportunities with 100+ links and an onTopic score of B and 29651 with a score of A &#8211; meaning 99% relevant, which means&#8230;<br />&nbsp;</li>
<li>On average, <strong>every single keyword returns at least 1 highly relevant, 100+ unique linking domain opportunity</strong>&#8230;<br />&nbsp;</li>
<li>Which means that, even if you have a meager 2% conversion rate on your campaigns, you just got 2 permanent, natural links for $6.70.<br />&nbsp;</li>
</ol>
<p>These numbers are astounding for a product that is, frankly, still brand new. The reason why the numbers are so shocking is that we have nearly 2 decades of broken resources built up. This is a veritable link building gold rush.</p>
<h2>So What Does This Mean for Me</h2>
<p>Honestly, I think everyone has 2 choices.</p>
<ol>
<li>Start broken link building on your own, either following the steps in the <a href="http://www.seomoz.org/blog/the-broken-link-building-bible">BLBB</a> or using a tool like Garrett&#8217;s</li>
<li>Or hire an agency to do it for you. Any agency worth a grain of salt should be in on this method, <a href='http://www.virante.org'>Virante</a> surely is.</li>
</ol>
<p>What isn&#8217;t an option is not doing broken link building &#8211; and here is why. Broken link building is only slightly a renewable resource. The best opportunities can and will be taken by someone in your niche. Time is of the essence. </p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/9786010-reasons-to-do-broken-link-building/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Piwik Plug-in for SEO Tools Begun&#8230; Already tying in SEOMoz, Majestic &amp; AHrefs</title>
		<link>http://www.thegooglecache.com/white-hat-seo/piwik-plug-in-for-seo-tools-begun-already-tying-in-seomoz-majestic-ahrefs/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/piwik-plug-in-for-seo-tools-begun-already-tying-in-seomoz-majestic-ahrefs/#comments</comments>
		<pubDate>Thu, 03 Jan 2013 20:00:00 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Advanced]]></category>
		<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Piwik SEO Tools]]></category>
		<category><![CDATA[Rants & Raves]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=757</guid>
		<description><![CDATA[So, I wanted to keep a progress report going so I don&#8217;t lose site of my goals. It was pretty awesome to start working with Piwik plug-ins. I used the out-of-the-box Piwik SEO plug-in to create this first widget&#8230; If you are familiar with the SEOMoz, Majestic and AHRefs APIs, it literally takes only an [...]]]></description>
			<content:encoded><![CDATA[<p>So, I wanted to keep a progress report going so I don&#8217;t lose site of my goals. It was pretty awesome to start working with <a href="http://www.piwik.org">Piwik</a> plug-ins. I used the out-of-the-box Piwik SEO plug-in to create this first widget&#8230;</p>
<p><a href='http://i.eho.st/pjtoiv7e.jpg'><img src='http://i.eho.st/pjrny668.jpg' border='3'></a></p>
<p>If you are familiar with the SEOMoz, Majestic and AHRefs APIs, it literally takes only an hour or so to create your own widget that pulls the data directly into your Piwik analytics install.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/piwik-plug-in-for-seo-tools-begun-already-tying-in-seomoz-majestic-ahrefs/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Why I am Dropping Google Analytics in 2013 &#8211; Piwik, Here I Come!</title>
		<link>http://www.thegooglecache.com/white-hat-seo/why-i-am-dropping-google-analytics-in-2013-piwik-here-i-come/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/why-i-am-dropping-google-analytics-in-2013-piwik-here-i-come/#comments</comments>
		<pubDate>Wed, 26 Dec 2012 20:22:12 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Analytics]]></category>
		<category><![CDATA[Rants & Raves]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=751</guid>
		<description><![CDATA[So, for 2013, I have officially dropped Google Analytics off of The Google Cache. There are a couple of reasons behind the decision, so I wanted to go ahead and drop them here&#8230; 1. It is not my data to give away. It took me a while to come to this conclusion. I always looked [...]]]></description>
			<content:encoded><![CDATA[<p>So, for 2013, I have officially dropped Google Analytics off of The Google Cache. There are a couple of reasons behind the decision, so I wanted to go ahead and drop them here&#8230;<span id="more-751"></span></p>
<p><strong>1. It is not my data to give away.</strong> It took me a while to come to this conclusion. I always looked at Google Analytics as a trade off. I will let Google know about my site in exchange for a better way to visualize the usage on my site. But something kept nagging &#8211; and that was the reality that Google is not in the Analytics game for your site&#8217;s data. They are in the game for your users&#8217; data. Chances are, your users have no idea that Google is collecting their usage data as they move through your site, and they definitely don&#8217;t understand the consequences of it. I think I deserve to know what my users are doing on my site, but do I deserve to be able to sell their data to a third party (in exchange for software)? That seems like a stretch.</p>
<p><strong>2. It might help your competitor&#8217;s ad campaigns.</strong> If Google knows a user has been on your site, what is to keep them from retargeting advertisements for your competitors in the future based on that data point? I haven&#8217;t looked into this possibility at all, but I have a nagging suspicion it is one of the reasons why Google offers such a powerful tool set openly and freely.</p>
<p><strong>3. Google Analytics doesn&#8217;t expose the raw data</strong> I hate having to go back to raw logs to try and understand exactly what is going on in a campaign. </p>
<p><strong>4. Google Analytics is a closed environment</strong> If I want to tie rankings or links data into my analytics, I have to export the analytics data into either 3rd party software or a spreadsheet. I&#8217;d prefer to be able to improve the analytics program directly.</p>
<p><strong>5. Information Parity</strong> As petty as this sounds, if Google isn&#8217;t going to give me the keywords from their site, why should I give them my entire site&#8217;s usage data? Why should I trust Google with my data when they won&#8217;t trust me with such a small amount of theirs? Why does Google think their users need to be protected from me, but for some reason I think my users don&#8217;t need to be protected from Google?</p>
<h2>So, what is the replacement? Piwik</h2>
<p><a href='http://piwik.org/'>Piwik</a> is a fairly robust open source analytics software. It is highly extensible which means I and my team can mod it to our hearts desire. Hopefully by this time next year we will have a unique, Piwik-powered SEO/Analytics hybrid which we can use on clients&#8217; sites.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/why-i-am-dropping-google-analytics-in-2013-piwik-here-i-come/feed/</wfw:commentRss>
		<slash:comments>28</slash:comments>
		</item>
		<item>
		<title>Googlebot&#8217;s Javascript Interpreter: A Diagnostic</title>
		<link>http://www.thegooglecache.com/white-hat-seo/googlebots-javascript-interpreter-a-diagnostic/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/googlebots-javascript-interpreter-a-diagnostic/#comments</comments>
		<pubDate>Thu, 29 Nov 2012 21:50:59 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Advanced]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=586</guid>
		<description><![CDATA[Warning: This is very old. I began writing this several months ago and just never published after some back and forth w/ Matt Cutts. Take with a grain of salt. Over the past two weeks multiple respected bloggers in the search community have commented on the increasing abilities of Googlebot, especially following Google&#8217;s announcement that [...]]]></description>
			<content:encoded><![CDATA[<div style='background:#ff0000;margin-bottom:10px;padding:10px;'><b>Warning:</b> This is very old. I began writing this several months ago and just never published after some back and forth w/ Matt Cutts. Take with a grain of salt.</div>
<p>Over the past two weeks multiple respected bloggers in the search community have <a href='http://www.seomoz.org/blog/just-how-smart-are-search-robots'>commented</a> <a href='http://ipullrank.com/googlebot-is-chrome/'>on</a> <a href='http://searchenginewatch.com/article/2122137/Googlebot-Learns-to-Read-AJAXJavaScript-Comments'>the</a> increasing abilities of Googlebot, especially following Google&#8217;s announcement that it can now handle some forms of AJAX. I have, admittedly, long believed that we over-estimate what Google and Googlebot are capable of, so I wanted to run a proper experiment to determine the exact capabilities of GoogleBot in reading and interpreting Javascript.</p>
<h3>The Question</h3>
<p>How sophisticated is Googlebot&#8217;s javascript interpretation and, more specifically, which Javascript functions can Google accurately interpret.</p>
<h3>The Functions and Features Tested</h3>
<ul>
<li><b>Simple Variables</b>: Can Google Understand Simple Variable Assignment such as &#8220;var foo = &#8216;test content&#8217;; document.write(foo); &#8220;</li>
<li><b>Simple Variable Concatenation</b>: Can Google Interpret &#8220;var foo = &#8216;test content&#8217;; var foo += &#8216; more &#8216;; document.write(foo); &#8220;</li>
<li><b>Simple Document.write();</b></li>
<li><b>Simple element.innerHTML();</b></li>
<li><b>Dummy Variables</b>: We added this test in to make sure Google only indexes data that is printed to the page, and not every string randomly stored in a variable.</li>
</ul>
<h3>The Methods Tested</h3>
<ul>
<li><b>Inline</b>: We tested javascript stored on the page</li>
<li><b>Included</b>: We tested javascript in a simple include</li>
<li><b>Included behind Robots.txt</b>: We tested javascript in an include blocked by Robots.txt</li>
</ul>
<h3>The Results</h3>
<p><center></p>
<table border='0' cellpadding='10' cellspacing='0' style='text-align:center;border-width:1px;border-color:#000;border-style:solid;font-family:Verdana;font-size:12px;'>
<tr style='background:#000;color:#fff;font-weight:bold;'>
<td></td>
<td>Inline</td>
<td>Include</td>
<td>Blocked</td>
</tr>
<tr>
<td style='text-align:left;background:#eee;'>Variables</td>
<td>Yes</td>
<td>Yes</td>
<td style='background:#ff0000;font-weight:bold;'>Yes?</td>
</tr>
<tr>
<td style='text-align:left;background:#eee;'>Concatenation</td>
<td>Yes</td>
<td>Yes</td>
<td style='background:#ff0000;font-weight:bold;'>Yes?</td>
</tr>
<tr>
<td style='text-align:left;background:#eee;'>document.write()</td>
<td>Yes</td>
<td>Yes</td>
<td style='background:#ff0000;font-weight:bold;'>Yes?</td>
</tr>
<tr>
<td style='text-align:left;background:#eee;'>element.innerHTML()</td>
<td>Yes</td>
<td>Yes</td>
<td style='background:#ff0000;font-weight:bold;'>Yes?</td>
</tr>
<tr>
<td style='text-align:left;background:#eee;'>Dummy</td>
<td>No</td>
<td>No</td>
<td style=''>No</td>
</tr>
<tr style='text-align:left;background:#000;color:#fff;font-weight:bold;'>
<td>Total</td>
<td>5/5</td>
<td>5/5</td>
<td style=''>5/5</td>
</tr>
</table>
<p></center></p>
<h3>Hold Your Breath</h3>
<p>Everyone now is probably staring at the Red highlighted column that indicates Googlebot can and will interpret Javascript hidden behind a robots.txt exclusion. I took the time to verify this and check it with multiple sources before finally reaching out to the <a href='http://www.mattcutts.com/blog/'>source of truth</a>. <b>Was Googlebot really ignoring robots.txt when considering Javascript includes?</b> </p>
<p>In short, no.</p>
<h3>Best Practices for Robots.txt and Javascript</h3>
<p>First, let me state that it is likely that Google will at some point (if they don&#8217;t already) use blocked .JS and .CSS as a negative signal. While there are legitimate reasons for this, there is no easy way for Google to verify that the contents of a page using these tactics are not greatly modified by the blocked files. So, be careful.</p>
<p>That being said, Matt was kind enough to respond in great detail to my findings, and pointed out several things one should consider when blocking .JS files which ultimately resulted in false positives in my analysis:</p>
<ol>
<li><b>Give Your Robots.txt a Head Start</b>: This makes a lot of sense, but most webmasters (myself included) handle the new content and robots.txt at the same time.<br />
<blockquote>&#8220;In an ideal world, youâ€™d wait 12 hours just to be completely safe. Essentially, any time you make a new directory and block it at the same time, thereâ€™s a race condition where itâ€™s possible we would fetch the test.js before we saw it was blocked in robots.txt. Thatâ€™s what happened here.&#8221; &#8211; <b>Matt Cutts</b></p></blockquote>
<p> It is certainly untenable for Googlebot to check the Robots.txt with every new file downloaded on your site, so giving that head start can make a big difference.</li>
<li><b>User-Agent Directives can Override One Another</b>: This one was new to me, but it does make sense. If you begin with a generic &#8220;User-Agent: *&#8221; directive, and follow up with a specific directive, &#8220;User-Agent: Googlebot&#8221;, the latter overrides the former in terms of Googlebot, <b>it does not append to it</b>.<br />
<blockquote> If you disallow user-agent: * and then have a disallow user-agent: Googlebot, the more specific Googlebot section overrides the more general section&#8211;it doesnâ€™t supplement it. &#8211; <b>Matt Cutts</b></p></blockquote>
</li>
<li><b>Robots.txt is only Respected Up to 500,000 Characters</b>: I know this is a pretty big number, but if you have a lot of unique URLs to block, it can get messy. This is particularly frustrating with the Google Webmaster Tools Robots.txt checker, which only analyzes the first 100,000.</li>
<li><b>To Be Certain, Use the X-Robots-Tag</b>: There is a <a href='http://code.google.com/web/controlcrawlindex/docs/robots_meta_tag.html'>great writeup here</a> on how to use the HTTP Header X-Robots-Tag to indicate to Google that any file and filetype should not be indexed. Because this header is sent along with the file, Googlebot can respect it in real-time.</li>
<li><b>.JS Files can Be Slow to Clear from Index</b>: As is the case with any lower-priority crawled document, .JS files can take a while to clear Google&#8217;s index if for some reason Google finds the blocked .JS.<br />
<blockquote>The crawl team said that once a .js file has been fetched, it can be cached in our indexing process for a while. &#8211; <b>Matt Cutts</b></p></blockquote>
<p> This is certainly not an understatement. The .JS indexed 2 weeks ago is still present on pages that were indexed before Googlebot realized the exclusion. I believe, though, that you can always use the emergency removal tool if this happens.</li>
</ol>
<h3>Re-Running the Test</h3>
<p>Of course, after hearing back from Matt, I needed to re-run the blocked .JS test to confirm. Sure enough, now that the .JS file was behind a previously-established blocked directory, Googlebot respected the disallow. (Also, just to be careful, I tested it on a separate domain with which Matt was not familiar, so I can assure you there was no trickery involved).</p>
<h3>Take Aways</h3>
<ol>
<li><b>On Javascript</b>: Google is actually interpreting the Javascript it spiders. It is not merely trying to extract strings of text and it does appear to be nuanced enough to know what text is and is not added to the Document Object Model. This is impressive.</li>
<li><b>On Experimenting</b>: Confirm, retest, ask, retest, confirm, confirm, write, confirm, revise, confirm, publish.</li>
<li><b>On SEO</b>: Learn new shit every day.</li>
</ol>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/googlebots-javascript-interpreter-a-diagnostic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Laughing in the Face of the FTC &#8211; Raven, SEOMoz, and the AdWords API</title>
		<link>http://www.thegooglecache.com/rants-and-raves/google-laughing-in-the-face-of-the-ftc-raven-seomoz-and-the-adwords-api/</link>
		<comments>http://www.thegooglecache.com/rants-and-raves/google-laughing-in-the-face-of-the-ftc-raven-seomoz-and-the-adwords-api/#comments</comments>
		<pubDate>Tue, 27 Nov 2012 15:23:43 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Rants & Raves]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=738</guid>
		<description><![CDATA[Update2: As expected, Google&#8217;s &#8220;leniency&#8221; with Raven came with a huge price &#8211; drop the SERPs. Yep, exactly as I said, Google doesn&#8217;t want you to ever have the ability to compare the value of Adwords to SEO. Update: Google has reopened the API for Raven Tools! Let&#8217;s keep a watch and see if they [...]]]></description>
			<content:encoded><![CDATA[<div style='background:#fff;margin-bottom:10px;border-width:1px;border-color:#ccc;border-style:solid;padding:10px;'>
<b>Update2:</b> As expected, Google&#8217;s &#8220;leniency&#8221; with Raven came with a huge price &#8211; drop the SERPs. Yep, exactly as I said, Google doesn&#8217;t want you to ever have the ability to compare the value of Adwords to SEO.</p>
<p><b>Update:</b> Google has reopened the API for Raven Tools! Let&#8217;s keep a watch and see if they do the same thing for SEOMoz. Keep up the pressure!</div>
<h2>What is Going Down?</h2>
<p>If you are in the search marketing community, you are likely familiar with <a href='http://www.raventools.com'>Raven Tools</a> and <a href='http://www.seomoz.org'>SEOMoz</a>. These two SEO tool leaders provide incredible tool sets for webmasters looking to improve their search performance &#8211; in both paid and organic. They aggregate data from numerous sources including the Google Adwords API. Back in September, SearchEngineLand.com reported that Google had begun a process of revoking a large number of <a href='http://searchengineland.com/google-angers-adwords-api-developers-by-revoking-access-then-begins-to-provisionally-allow-access-again-93170'>AdWords users&#8217; API access</a> and then provisionally allow access. This explained why SEOMoz saw intermittent access of the API over the last several months. We have now learned that both SEOMoz and Raven Tools have lost their AdWords access.<span id="more-738"></span></p>
<h2>Why is Google Doing This?</h2>
<p>From Google&#8217;s perspective, the AdWords API is to be specifically used by AdWords customers to help improve their own campaigns. In general, they are not to be used as a part of an exposed public or private tool, even if the expressed use is to improve AdWords campaign performance. In reality, the AdWords API is one of Google&#8217;s biggest liabilities for numerous reasons. Here are just a few&#8230;</p>
<ol>
<li>Traffic volume data provides competitors with information on target markets</li>
<li>Bid prices allow for advertisers to automatically compare prices and, potentially, determine better advertising opportunities</li>
<li>Bid prices allow competitor ad networks to price their products appropriately</li>
</ol>
<p>Google&#8217;s Adwords API, despite being paid, is kept under incredibly strict usage guidelines that are enforced regularly and without much forgiveness. Why would Google create a paid service and then restrict the number of its customers and the volume of its usage so carefully? <b>To protect industry dominance</b>.</p>
<h2>What Does the FTC Have to Do With This?</h2>
<p>The FTC has been investigating Google for monopolistic practices for quite some time. While we can quibble back and forth about whether Google is genuinely a monopoly, there is little to no argument that Google is not <b>trying to behave like one</b>. The <a href='http://www.nytimes.com/2012/10/13/technology/ftc-staff-prepares-antitrust-case-against-google-over-search.html?pagewanted=all&#038;_r=0'>New York Times</a> points out that &#8220;The investigators are&#8230; looking into whether Google’s automated advertising marketplace, AdWords, discriminates against advertisers from competing online commerce services like comparison shopping sites and consumer review Web sites.&#8221; </p>
<p>The tight reigns on the Adwords API do just this. You cannot know the cost of using AdWords without actually being a logged in customer of AdWords. The prices are private and restricted. It is a violation of their Terms and Conditions to extract data in any other way than those expressly permitted by Google (<a href='https://adwords.google.com/select/tsandcsfinder'>T&#038;C Section 4(B)</a>). This would be akin to WalMart refusing to publish their prices in any public fashion and then refuse to allow you to log-in and &#8220;extract&#8221; their prices in any way other than what they expressly tell you is OK so you know if you are getting a good deal &#8211; <b>AND THEN STARTED ENFORCING IT</b> by booting people out of the store who were caught writing down prices or seeing how many items were still in stock. </p>
<p>This kind of activity is alright when there are a bunch of other stores in the area, but when Google is by far the biggest advertising game in town, it starts to look very suspicious.</p>
<h2>Hubris</h2>
<p>The big word that comes to me here is hubris. Google is over-confident that the FTC is not willing to take them to court. Winning monopoly cases is expensive and right now the Federal Government doesn&#8217;t exactly have a huge budget. But this kind of activity simply laughs in the face of the FTC right as they are planning to release their findings. If I were Google, I would be more careful. But, alas, I am not. I&#8217;m just a guy trying to run a business. </p>
<h2>Addendums&#8230;</h2>
<p>Great feedback is coming in from the search community. <a href='http://andrewdumont.me/'>Andrew Dumont</a> mentioned that by not allowing users to purchase ads from within their app, they are in violation of the Terms &#038; Conditions of the API. If I wasn&#8217;t clear above, I don&#8217;t mean to state that people are wrongfully having their API access revoked. The T&#038;C are being fairly enforced. My intent was to note that these rigorously enforced T&#038;C serve a very specific purpose of <b>intentionally hiding both their pricing and their inventory to make it difficult for consumers to compare costs and opportunities with both paid and free alternatives</b>.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/rants-and-raves/google-laughing-in-the-face-of-the-ftc-raven-seomoz-and-the-adwords-api/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Exponential Moving Average, Volatility Pockets &amp; Improved Correlation Studies</title>
		<link>http://www.thegooglecache.com/white-hat-seo/exponential-moving-average-volatility-pockets-improved-correlation-studies/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/exponential-moving-average-volatility-pockets-improved-correlation-studies/#comments</comments>
		<pubDate>Mon, 12 Nov 2012 16:05:07 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Advanced]]></category>
		<category><![CDATA[Rants & Raves]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=721</guid>
		<description><![CDATA[I apologize in advance for those of you who aren&#8217;t quite interested in the math, but I have had this stuck in my head for almost a week now and can&#8217;t get rid of it. To be honest, I probably shouldn&#8217;t write about this because it is such an integral part of so many future [...]]]></description>
			<content:encoded><![CDATA[<p>I apologize in advance for those of you who aren&#8217;t quite interested in the math, but I have had this stuck in my head for almost a week now and can&#8217;t get rid of it. To be honest, I probably shouldn&#8217;t write about this because it is such an integral part of so many future Virante projects, but here goes&#8230; </p>
<p>
<h2>The Deception of Integer Rankings</h2>
<p>A key problem with much of SEO is that rank order is deceptive. We perceive sites as positions 1 through 10, each with a fixed integer position when, in reality, their proximity to one another in terms of actual relationship-to-the-query scores can vary greatly. It is this hidden, underlying score that is key to many SEO functions &#8211; keyword competitiveness and ranking factor determination to name two. Below, I discuss a simple, algorithm-agnostic method to determine relative rank gaps and then explain a few key uses of this information.<span id="more-721"></span></p>
<p><center><img src="http://www.thegooglecache.com/wp-content/uploads/2012/11/rank-order-deception.jpg" alt="" title="rank-order-deception" width="450" height="300" border='2' /></center></p>
<p>
<h2>Exponentially Weighted Moving Average</h2>
<p>While search results are ordered 1-10 and have a single integer value each day, over time this position can and will fluctuate as ranking factors are updated and competitors make changes. This fluctuation is represented in the aggregate in systems like <a href='http://mozcast.com'>SEOMoz MozCast</a> or <a href='http://serpmetrics.com/flux/'>SerpMetrics Flux</a>. Sometimes we will look at an individual keyword to determine a specific SERP&#8217;s volatility but in this case, we will instead look at individual URLs within each keyword SERP over time to determine an average ranking.</p>
<p>As you can imagine, it would be silly to just average the last 30 days of rankings for an individual URL on any given keyword. Doing so would ignore the trajectory of the URL (tending upwards or downwards) and, if that time period included a major update, could have remarkably high variance. Instead, we look to a simple technique used in stock market volatility forecasting called <strong>Exponentially Weighted Moving Average</strong> (EWMA).</p>
<p>I&#8217;ll give a terse explanation here, but please take some time to <a href='http://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average'>read up a little more</a>. When you average something over time, you can often assume that the most recent data is more valuable that the oldest data. Yesterday&#8217;s rankings are probably a better predictor than your rankings 30 days ago of how your site will perform tomorrow. Instead of averaging all days of rankings data together, we weight them based on recency. The most simple method to accomplish this is a rolling average where each day you find the mean of today&#8217;s value and all previous day&#8217;s means. </p>
<p>Let&#8217;s say you rank #1 today and yesterday you ranked #2. Using EWMA, we would average today and yesterday and come up with 1.5. Another day elapses and you are still at #1. Instead of averaging, #1, #1, and #2, you would simply average today&#8217;s rank (#1) with yesterday&#8217;s average (#1.5). Your new EMWA is #1.25. In this regard, each new day is worth 50% of the weight in the average, and all previous days are worth a combined 50%. </p>
<p>The magic, of course, is in choosing that weight. Maybe an exponential weight is too strong, and today&#8217;s ranking should only matter 1/3. Or maybe we should vary that weight based on a second rolling metric itself, like the degree of change, or a separate weighted variance. (or, most importantly, a weight that preserves rank order) While worth thinking about, let&#8217;s go ahead and jump to the good stuff.</p>
<h2>Finding Volatility Pockets for Keyword Competitiveness</h2>
<p>Let&#8217;s say we have now used an EWMA to determine an average rank rather than displayed rank. In the image above, you can see the overlaps that this can create. The first ranking URL might normally rank #1 6 out of 7 days. However, approximately 1 time a week, the #2 spot over takes them. URLs ranking #3 and #4 almost never transpose with one another, and 4 never cedes its position to #5. </p>
<p>However, #5, #6, and #7 quite regularly interchange with one another. And your site currently ranks #8. You haven&#8217;t dropped back to 9 in over a month, but your rarely shuffle to #7. What we have done here is used the EWMA as a proxy for competitor&#8217;s relative rank score. In this example, because #5, #6, and #7 are all similarly scored, there is a pocket of volatility surrounding them. Because our example site is sitting directly behind this pocket, we have reason to believe that an incremental improvement in value could precipitate a 3 position jump in the SERPs. Of course, <strong>we would need to further this analysis with a look at exactly what kind of investment that might entail</strong>, but we have a clean, easy to calculate opportunity from which to begin our analysis. </p>
<p><center><img src="http://www.thegooglecache.com/wp-content/uploads/2012/11/volatility-pockets.jpg" alt="" title="emwa rank gap volatility clusters seo" width="450" height="300" border='2' /></center></p>
<p>Finding volatility pockets like this that sit around positions #3, #4, and #5 can be particularly valuable because a jump from below the fold to above the fold can yield strong CTR increases from the search results. </p>
<p>
<h2>Improving Correlation Studies</h2>
<p>This one really gets my blood going because of the opportunity it presents for better aggregate studies of rank factors. Some of you might know that a few years back with the help of many including <a href='http://www.authoritylabs.com'>Authority Labs</a>, <a href='http://www.majesticseo.com'>Majestic SEO</a>, and <a href='http://www.seomoz.org'>SEOMoz</a>, we ran a correlation study with over 1,000,000 keywords. This produced many innovations at Virante including one we still use regularly today, <a href='http://www.thegooglecache.com/white-hat-seo/relevancy-modified-mozrank-a-smarter-metric-for-rank-analysis/'>relevancy modified MozRank</a>. </p>
<p>One of the biggest issues we run into with the correlation study is that the dependent variable has been tampered with by Google. Instead of giving us the actual rank score, we see the 1 through 10 integer rank. #5 might literally be 100x more relevant to the query than #6, and #6 is only .000000001x better than #7. If we take a snapshot of rankings on just 1 day, we can&#8217;t model this out. We are stuck with a fixed, stair-step dependent variable.</p>
<p>But, if we use an exponentially weighted moving average, we can help solve several issues&#8230;</p>
<ol>
<li>We can handle the issue of lagging independent variables (like slightly out-dated link data)</li>
<li>We can, to some degree, transition rank from discrete to continuous variable</li>
<li>We can establish a cohort of solid, unchanging rankings to analyze.</li>
</ol>
<p>
<h2>More with Math</h2>
<p>The Art vs. Science debate continues on in the SEO community. I don&#8217;t think that it is an unhealthy argument, but I do want to make sure that we always stay grounded in math and science. These tell us what matters and in what degree, it helps us determine measurable goals. Art is the method we use to meet those goals.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/exponential-moving-average-volatility-pockets-improved-correlation-studies/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Citation Labs&#8217; Broken Link Building Tool First to Integrate nTopic</title>
		<link>http://www.thegooglecache.com/white-hat-seo/citation-labs-broken-link-building-tool-first-to-integrate-ntopic/</link>
		<comments>http://www.thegooglecache.com/white-hat-seo/citation-labs-broken-link-building-tool-first-to-integrate-ntopic/#comments</comments>
		<pubDate>Wed, 07 Nov 2012 16:36:01 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Rants & Raves]]></category>
		<category><![CDATA[White Hat]]></category>

		<guid isPermaLink="false">http://www.thegooglecache.com/?p=717</guid>
		<description><![CDATA[We are excited today to announce that Garrett French&#8217;s Broken Link Builder is the first 3rd party application to integrate nTopic, the content relevancy score. The integration is actually incredibly useful. In making link prospecting scalable, it is important that you be able to filter prospects based on relevancy. This is the case for nearly [...]]]></description>
			<content:encoded><![CDATA[<p>We are excited today to announce that Garrett French&#8217;s <a href='http://www.brokenlinkbuilding.com'>Broken Link Builder</a> is the first 3rd party application to integrate <a href='http://www.ntopic.org'>nTopic</a>, the content relevancy score.</p>
<p>The integration is actually incredibly useful. In making link prospecting scalable, it is important that you be able to filter prospects based on relevancy. This is the case for nearly any out reach effort, but is particularly difficult in the Broken Link Building strategy because the targets no longer exist. In the past, marketers would need to click through to archive.org to try and find a record of the previous site, or use the URL to guess what the content may have been about. Historically, this process has been difficult to automate. You could use proxies for relevancy like keyword in title or URL neither of which would have been very useful, for example, if you were judging this page for relevancy to the term SEO, despite the fact that its nTopic relevancy is over 98%. <span id="more-717"></span></p>
<p><img src='http://www.thegooglecache.com/wp-content/uploads/2012/11/ntopic-broken.jpg' border='2'></p>
<p>The Broken Link Building tool takes each broken link opportunity, scrapes the Archive.org entry and any anchor text in links pointing to it, and passes those along with the prospecting keywords over to the nTopic API. In the campaign example above, a 404 page was found to the autism-society.org because an SEO blogger had randomly linked to it as part of a charity event. These types of unrelated links show up regularly across the web, but nTopic allows Broken Link Building users to automatically filter them out because their relevancy grades come back quite low (in this case, a grade F). Instead, the user can focus on the MarketLeap 404s, or CivicSEO.com old wordpress seo plugins page, which are more relevant. </p>
<p>This kind of nTopic integraction will continue to spread to search engine optimization services and tools across the web where determining relevancy matters &#8211; and in SEO, it matters both on the link building and content creating sides of the coin.</p>
No tags for this post.]]></content:encoded>
			<wfw:commentRss>http://www.thegooglecache.com/white-hat-seo/citation-labs-broken-link-building-tool-first-to-integrate-ntopic/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
