Open Source, Dtrace, and tuning

The bottom line lesson from a report on finding and fixing a performance problem in MySQL has nothing to do with locks or DTrace and everything to do with open source and how access to code and tools is making the extraordinary seem ordinary.

In the good old days being an Oracle performance tuning expert paid pretty good money - and it was easy: because most problems were consequences of tuning expertise, the right answer was usually to set things to their defaults, switch to raw devices, and grab as much memory as possible before running out the door with another job well done. Unfortunately for me, memory got to be cheaper than expertise - and now I see people running a few million rows in 32GB of RAM.

On the other hand you do still see people whose applications max out their computers and who're therefore facing a choice between committing some expertise to buying time by making the thing run better on the existing hardware or going back to the bosses for emergency hardware - and licensing- upgrades.

If you're in that position - and almost every production systems manager gets into it sooner or later - there's an important lesson to be learned from a Sun guy's blogging on work he and his team did to find and remediate a locking problem with MySQL.

The entire story is on Neelakanth Nadgir's blog - Here's his introduction followed by his action summary:

While comparing sysbench runs using two different versions of MySQL 5.1, I noticed a big difference in the system utilization. One version had much more idle time than the other. This difference was much more apparent with high thread counts on machines with lots of CPU. A look at the system calls showed a large number of lwp_park system calls. That is, threads were being put to sleep.

[lots of detail on what they did and how]

So the mysql server is requesting ha_innobase::info() to not hold a lock, and it is being ignored by ha_innobase::info()!.

I compared against MySQL 5.0 and saw that this particular lock was not held when ha_innobase::info() was called. Searching through the commit logs I found that this was introduced by Bug#32440.

Quickly hacking the code to revert to the old behavior gave a big boost in performance. Hence I filed Bug #38185 ha_innobase::info can hold locks even when called with HA_STATUS_NO_LOCK. Luckily its a very small change and a fix is already in progress.

Moral of the story? A simple contended lock can bring down your performance by quite a lot.

It's his story, but he gets the moral of the thing completely wrong: this isn't about how a lock affected performance, it's about how open source made it possible to identify and remediate a problem that in a proprietary context would probably have been best addressed by buying more hardware and/or more expensive licensing.

And there's a bottom line lesson too: open source coupled with expertise opens the door to increased performance and flexibility (customization) in meeting unusual requirements today in much the same way that cheaper memory did in the nineties.