Bringing home the bacon

Summary:Last week, I took Bacon's Information to task for a Web crawler that wasn't behaving as nice as I'd have liked. I'm happy to say that contrary to my bias, they've been very helpful and responsive.

Last week, I took Bacon's Information to task for a Web crawler that wasn't behaving as nice as I'd have liked. I'm happy to say that contrary to my bias, they've been very helpful and responsive.  I think we're both better off for the exchange.

After I blogged about the problem, I got a very polite email from Chris Thilk who is a Senior Digital Media Specialist at Bacon's Information. Chris originally thought I didn't want my site indexed. Actually nothing is further from the truth. I'm happy to have my site indexed so that the content is findable in multiple contexts. I just wanted it indexed more politely.

Chris pointed out that even though I had a robots.txt file, it wasn't blocking the primary culprit, /mt-search.cgi. This is my bad for not updating my robots.txt file as my site changed.  This is a tough thing to keep track of since it requires anticipating problems that changes to your site might cause and then determining a simple way to block them. 

I asked Chris to change their crawler to observe the re="nofollow" attributes on hypertext anchors and that they consider interleaving requests to multiple sites so that a single site sees requests spread out over time--the biggest problem I had was that there were hundreds of requests in just a few minutes. Chris reports that they've made some changes to how they index and since then I haven't had a problem.

My hat's off to Chris and Bacon's Information for being responsive and helpful.

Topics: Browser

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.