Keyword Research on Regular Expressions Steroids in Grepwords

There really hasn’t been much innovation in the keyword research space for a while and for good reason – the largest problem of getting good data has long been answered by top providers like SEMRush, Trellian KeywordDiscovery, WordStream and others like KeywordSpy. The data they provide is wonderfully useful, but the one thing that always felt limiting was the way we could get at their data. While they might provide accurate estimates for Google traffic, or useful data on large numbers of keywords, getting at the data required clumsy querying techniques no better than exact, phrase and broad match. As a developer, I found this cumbersome. Recently, though, I have found a better solution – Regular Expressions. At Virante we have long had access...

Open Penguin Data Project – Calling for Submissions

Many of you may have seen the launch of my new project Open Penguin Data. The description of the project isn’t quite clear so I thought I would explain a little further. What is the Open Penguin Data Project? I want to crowdsource potential variables that might be used by Google to determine which pages are caught by Penguin. I have created a CSV of URLs that are marked as either (1) hit by penguin or (2) not hit by penguin for a series of keywords. I need the SEO community to provide variables and their values for each one of the URLs in the dataset. For Example: Let’s say you believe that having links from blog comments might be a variable Google uses as part of Penguin. You would download the CSV of URLs and mark each one as either having or not...

Biomagnification, Redirects and Back Link Penalties

I often find that the best sources of analysis in SEO, which is still a nascent industry, come from other academic pursuits. While these are regularly computer sciences (like latent dirichlet allocation) or mathematics (like volatility analysis), we sometimes find interesting lessons outside of those usual suspects – in this case, biology. Biomagnification is a fairly simple principle that through a series of prey-predator relationships, toxic substances tend to accumulate in higher percentages among organisms higher in the food chain. You can see a visualization of this in the image to the left. As mercury accumulates in various organisms, predators consume those organisms and absorb those toxins. Unless the organism has a way of disposing those toxins,...

Piwik Plug-in for SEO Tools Begun… Already tying in SEOMoz, Majestic & AHrefs

So, I wanted to keep a progress report going so I don’t lose site of my goals. It was pretty awesome to start working with Piwik plug-ins. I used the out-of-the-box Piwik SEO plug-in to create this first widget… If you are familiar with the SEOMoz, Majestic and AHRefs APIs, it literally takes only an hour or so to create your own widget that pulls the data directly into your Piwik analytics install. No tags for this post.

Googlebot’s Javascript Interpreter: A Diagnostic

Warning: This is very old. I began writing this several months ago and just never published after some back and forth w/ Matt Cutts. Take with a grain of salt. Over the past two weeks multiple respected bloggers in the search community have commented on the increasing abilities of Googlebot, especially following Google’s announcement that it can now handle some forms of AJAX. I have, admittedly, long believed that we over-estimate what Google and Googlebot are capable of, so I wanted to run a proper experiment to determine the exact capabilities of GoogleBot in reading and interpreting Javascript. The Question How sophisticated is Googlebot’s javascript interpretation and, more specifically, which Javascript functions can Google accurately interpret....

Exponential Moving Average, Volatility Pockets & Improved Correlation Studies

I apologize in advance for those of you who aren’t quite interested in the math, but I have had this stuck in my head for almost a week now and can’t get rid of it. To be honest, I probably shouldn’t write about this because it is such an integral part of so many future Virante projects, but here goes… The Deception of Integer Rankings A key problem with much of SEO is that rank order is deceptive. We perceive sites as positions 1 through 10, each with a fixed integer position when, in reality, their proximity to one another in terms of actual relationship-to-the-query scores can vary greatly. It is this hidden, underlying score that is key to many SEO functions – keyword competitiveness and ranking factor determination to name two....