We had severeral hours of downtime yesterday because of a “fun” PostgreSQL problem (see Beware of your next glibc upgrade for details). Short version: because there are lots of ways to encode text, and they can change subtly when you upgrade your OS, some of your unique database keys may end up being not so unique. This resulted in quite a few Mastodon tables having duplicate rows, despite having unique indexes.
Posts with the tag Outage:
Who accidentally ran sudo reboot because it was in their shell history? I would never do a thing like that. ahem Pesky gremlins.
I woke up to the terrible news that our good friends on another instance had lost their database during a software upgrade. Godspeed and good luck in bringing it back online. We’re pulling for you! The Free Radical site backs itself up hourly to a private S3 bucket, and keeps a month’s worth of these snapshots. It’s configured to upload all media files to S3 and serve them from there. In the event of a complete server failure, I could – assuming all goes well – re-deploy the software on a new server and restore from backup without losing more than just users and posts created since the last hour’s backup.
The server was down from about July 5, 2017 11:30AM PDT to about July 6 8:15AM PDT. I rushed off an apt-get update and didn’t check its results before rushing off to something else. Sorry for any inconvenience! I’ve set up monitoring with Uptime Robot to notify me about any future outages so you won’t be sitting in the dark.