This morning when I checked my mail, I had dozens of email messages from my cron daemon telling me Bad ObjectDriver config: onnection error: Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (111)
The problem is that mysql daemon that powers my blog had died late last night and the periodic tasks I run need the database. Looking at the error log in /var/lib/mysql, I saw that it had died at 21:59. I next looked in the Apache error logs for the same time and saw messages like this one repeated many times: spawning 8 children, there are 2 idle, and 23 total children.
Over the course of a few minutes I went from 16 httpd processes running to 75. My first thought that I was the victim of a drive-by hack attempt since these messages indicate that the server saw a sudden increase in load. Looking in the access log, however for the same time, I discovered that in fact it was a poorly written Web crawler from Bacon's MediaSource.
The problem is that Bacon's Web crawler doesn't respect the robots.txt file or the rel="nofollow attributes in hyperlink anchors. The crawler is following the tags I've recently placed at the bottom of my pages. Since these links perform searches on my blog for any entry with that tag, the crawler was spawning hundreds of simultaneous searches. I guess I have a few options:
- Try to block them. I don't know how many different IP addresses they might be crawling from, so this seems like a losing proposition.
- Contact them and tell them they're breaking my blog when they crawl and they need a better behaved crawler. That seems unlikely to get much response. After all, how tech savvy can a place be that requires IE to even look at their site?
- Try to beef up the server to withstand the load. The first task there would be to figure out why the load causes MySQL to die. I'm not a MySQL expert by any stretch, so I'd probably learn something. Maybe if I want to have tags lead to dynamic searches, I need to beef up things anyway.
One of the downsides of a decentralized network like the Internet is that anyone can do something boneheaded and mess things up for someone else. Given that, maybe the last option really is the best one. Defending yourself with good engineering is the first step to surviving in a hostile environment.