Tune that database!!
I've just been asked to troubleshoot a web server that had just started locking up for no apparent reason. Obviously the first thought was that it had been hacked ( some of the sites had, but no big deal - they got fixed on the fly ), so I logged in to have a look. Well, 30 minutes later I logged in, because that's how long it took to complete the process. It was a case of type some stuff and come back 10 minutes later. As the server was to all intents and purposes dead in the water (uptime eventually reported a load average of 170 for quad core machine), I used an one liner to kill the web server...
ps -ef | grep httpd | awk '{print $2}' | xargs kill -9
OK, look at the basics. Running through the standard checks on the hardware identified the problem - the CPU was spending 25% of it's time in waitio - and that's with no sites running. Looking for the culprit led me to the MySQL configuration - it's the standard one delivered with the OS! A bit of tweaking and here's the difference it made. Note the number of blocks read even though the blocks written is pretty constant - and this is the evil hours of the morning that I'm reporting stats for...
1. 2 nights ago
tps | rtps | wtps | bread/s | bwrtn/s | |
02:00:01 | 669.16 | 470.44 | 198.72 | 10597.54 | 4367.92 |
02:10:01 | 1061.96 | 554.77 | 507.19 | 12070.85 | 8397.63 |
02:20:01 | 667.65 | 489.46 | 178.19 | 13669.67 | 3997.29 |
02:30:01 | 493.00 | 328.03 | 164.97 | 12773.26 | 3585.85 |
02:40:01 | 659.46 | 470.01 | 189.45 | 22001.45 | 4203.37 |
02:50:01 | 390.18 | 202.65 | 187.54 | 5386.56 | 4061.29 |
2. last night
tps | rtps | wtps | bread/s | bwrtn/s | |
02:00:01 | 172.09 | 9.74 | 162.35 | 196.40 | 3496.43 |
02:10:01 | 464.45 | 14.52 | 449.92 | 227.87 | 7595.35 |
02:20:01 | 156.58 | 5.34 | 151.25 | 105.98 | 3364.32 |
02:30:01 | 152.03 | 3.86 | 148.17 | 51.24 | 3336.79 |
02:40:01 | 153.55 | 2.96 | 150.59 | 120.02 | 3332.31 |
02:50:01 | 139.33 | 1.85 | 137.48 | 31.31 | 3055.97 |
And he load average is now comfortably below 1 again.