So I spent a while thinking that ‘high uptime’ was a good thing. I was annoyed, once upon a time, at a VPS provider that regularly rebooted my system for security upgrades approximately monthly, because they were using Virtuozzo and needed to really, seriously, and truly block privilege escalations.
About monthly. As in 30-45 days…
I thought that was bad, but nowadays, the public-facing servers my employer runs live for less than a week. Maybe a whole week if they get abnormally old before Auto Scaling gets around to culling them. And I’m cool with this!
I try to rebuild an image monthly even if “nothing” has happened, and definitely whenever upstream releases a new base image, and sometimes just because I know “major” updates were made to our repos (e.g. I just did
composer updateeverywhere) and it’ll save some time in the self-update scripts when the image relaunches.
It turns out that the ‘uptime’ that the Slashdot crowd were so proud of was basically counterproductive. I do not want to trade security, agility, or anything else just to make that number larger. There is no benefit from it! Nothing improves solely because the kernel has been running longer, and if it does, then the kernel needs fixed to provide that improvement instantly.
And if the business is structured around one physical system that Must Stay Running, then the business on that server is crucial enough to have redundancy, backups, failovers… and regular testing of the failover by switching onto fresh hardware with fresh uptimes.