The Disadvantages of Speed: Finding Exact Match Domains in Drop Lists
I recently wrote a post on the advantages of speed specifically dealing with the ability to find exact match domains. One of the disadvantages of speed is that of the classic hammer problem. If you have a hammer, everything looks like a nail. Because lookup speeds are very fast, I made the assumption that I could just pound away. Eventually, though, that led to some insurmountable speed problems and would force more horizontal scaling.
Because the lookups were so fast, I assumed that the number of lookups could be egregiously large without greatly damaging performance. I. Was. Wrong. It hit me over New Years Eve night that I had been looking at the problem all wrong. The lookup data was structured in a way that required the massive lookups. Subsequently, restructuring the keyword data (a process itself that took about 2 hours to complete) has given us the ability to move from ~120 domains per second to ~1600 domains per second – a 10x+ improvement with just a modest change to the data.
I am now adding a bloom filter as about 89% of dropped domains are not exact match, meaning we can save a lot of lookups potentially. We will see how much more quickly things get after that point.