As some of you might know last night at midnight UTC a leap second was inserted.
We’d expected that to be handled smoothly by Linux, just like the Y2K issue. Unfortunately that turned out to be not the case.
The day before the leap second, ntpd reports to the kernel that a leap second event is going to happen. This report caused a complete lockup of some of our systems.
Of course we didn’t suspect the leap second to be the problem. We first suspected our hardware, then our virtualization software, then our firewall rules and eventually the phase of the moon. Then we got reports coming in from other people having the same issue. Old kernels, new kernels, it didn’t seem to matter.
The fix was simple. Stop ntpd everywhere, and remove the leap second event with a tool called adjtimex. We waited for the big-bang at midnight, but nothing happened.
Except… there seemed to be another issue. The load on all our servers running MySQL shut through the roof after midnight:
The fix is simple: sudo service ntpd stop; sudo date -s “`date`”; sudo service ntpd start
So if you’re running MySQL on your VPS, and your load is very high (you can check this with ‘top’), run that command and you should see the results immediately.
I’m glad we were able to limit to impact on our platform. Others were not so lucky. FourSquare, LinkedIn and even the PirateBay were impacted, and I suspect we’ll see more reports in the coming days. The next leap second hasn’t been announced yet, but next time we’ll make sure to be prepared.