Top Searches Google Should Suppress
I am opposed to censorship, but I also support privacy rights. Google’s massive database of anything and everything coupled with powerful search technologies have utterly destroyed privacy as we once knew it. An entire community of Google-Enabled hacking and mayhem has arisen around the search giant, including the popular johnny.ihackstuff.com . Below, I have compiled a list of the top 8 searches that Google should suppress to protect privacy of millions of people across the internet.
I am sure some people will be upset by the information below. I haven’t let the cat out of the bag, this information has been available for nefarious characters for years, it just really has not been talked about enough. So, in no meaningful order, here we go…
1. Credit Card Searches
It is disgustingly easy to find credit card information online.
The queries are actually quite simple to put together. “Visa*4635″, “Mastercard*5490″. In fact, just take a moment to go through your credit cards. Take the vendor (Visa, mastercard, AMEX, etc) and then the first 4 digits and place an asterisk between the two. Voila. Aside from research into how much bad information can be found online, I cannot imagine any good use of this type of search.
2. “Powered by” Searches.
“Powered by” searches are popular among guestbook, trackback, forum, wiki, and comment spammers because it becomes terrifically easy to exploit a single vulnerability across hundreds if not thousands of web sites.
3. “ext:bak” Searches + (ssn, credit card vendors, password etc.)
Google’s powerful “ext:” or filetype: search tools allow you to filter based on the filetype. Unfortunately, there are certain types of extensions that expose code. For example, a backup of a php code (filename.php.bak) on a server would allow for easy access to mysql passwords. mysql_connect ext:bak -denied localhost
4. Site Statistics Searches
One of the growing problems in the fight against non-email spam is “Referral Spam”. Link-hungry search engine optimizers will send tons of fake traffic to websites whose site-statistics packages get indexed by Google. This allows their link to get onto the “top referrers” part of the site statistics, and thus get a quality backlink. How do they find sites to spam? Google. For example, webalizer intitle:”usage statistics”
5. Spam Searches “mailto*hotmail”
This one baffles me beyond belief. For years wild-card searches have been used solely to scrape massive numbers of email addresses. A simple script kiddy can scrape Google of literally millions of email addresses with just a handful of queries like the one above and an off-line browser. Ridiculous. Honestly, can you think of one good, legal use of this query?
6. “ext:csv” Searches + (ssn, credit card vendors, password etc.)
The comma separated file is a popular, easy to use spreadsheet type method of storing data. It also was never meant for the web. Just prepend “ext:csv” with “accounts receivable” or “social security number” and you are sure to find implicitly private data that was never meant for the web. Once again, little to no good could come out of this search.
7. “ext:xls” Searches + (ssn, credit card vendors, password etc.)
Anyone noticing a pattern here? A lot of ext: type searches are popping up. Regardless, Excel spreadsheets really ought not be in the index at all (imho), but certainly not when coupled with terms like those above
8. Number Ranges + (ssn, credit card vendors)
Google allows you to search for anything between particular numbers. This makes it very easy to find social security numbers and credit card numbers. ex: ssn 111111111..999999999
The Best Solution
There is another option, though. A very simple solution to the problem. Google could simply take these queries, combine them with more sophisticated Regular Expressions, and run filters regularly to hide implicitly private data from searchers. How easy would that be? Instead of trying to supress the millions of different ways that people could search for private data, just exclude implicitly private data altogether.
If Google is to assume that anything and everything online without a robots.txt or meta-tag blocking it ought to be spidered, then the burden falls on them to protect the privacy of those it threatens.