Last week, I took Bacon's Information to task for a Web crawler that wasn't behaving as nice as I'd have liked. I'm happy to say that contrary to my bias, they've been very helpful and responsive. I think we're both better off for the exchange.
After I blogged about the problem, I got a very polite email from Chris Thilk who is a Senior Digital Media Specialist at Bacon's Information. Chris originally thought I didn't want my site indexed. Actually nothing is further from the truth. I'm happy to have my site indexed so that the content is findable in multiple contexts. I just wanted it indexed more politely.
Chris pointed out that even though I had a robots.txt file, it wasn't blocking the primary culprit, /mt-search.cgi. This is my bad for not updating my robots.txt file as my site changed. This is a tough thing to keep track of since it requires anticipating problems that changes to your site might cause and then determining a simple way to block them.
I asked Chris to change their crawler to observe the re="nofollow" attributes on hypertext anchors and that they consider interleaving requests to multiple sites so that a single site sees requests spread out over time--the biggest problem I had was that there were hundreds of requests in just a few minutes. Chris reports that they've made some changes to how they index and since then I haven't had a problem.
My hat's off to Chris and Bacon's Information for being responsive and helpful.