Open Penguin Data Project – Calling for Submissions

Many of you may have seen the launch of my new project Open Penguin Data. The description of the project isn’t quite clear so I thought I would explain a little further.

What is the Open Penguin Data Project?

I want to crowdsource potential variables that might be used by Google to determine which pages are caught by Penguin. I have created a CSV of URLs that are marked as either (1) hit by penguin or (2) not hit by penguin for a series of keywords. I need the SEO community to provide variables and their values for each one of the URLs in the dataset.

For Example: Let’s say you believe that having links from blog comments might be a variable Google uses as part of Penguin. You would download the CSV of URLs and mark each one as either having or not having links from blog comments. You would submit that via the form on and I will republish that data. Ultimately, we can build a large dataset and run various statistical models.

Another Example: If your name is Dr. Pete, cough cough, and you work for Moz, maybe you want to identify every penguin flagged URL that had the highest mozRank among those in the same SERP. If you notice, the “HAM” URLs are comprised of the unaffected URLs from the same SERPs as the affected URLs, so these types of comparative metrics can be calculated.

How You Can Get Involved

The easiest way is to submit data. If you don’t know how to do this, but have access to data, please reach out to me via twitter @rjonesx and we can discuss how we can get it into the project.


No tags for this post.

Submit a Comment

Your email address will not be published. Required fields are marked *