Outage

Posts with the tag Outage:

I fought PostgreSQL (and eventually won)

2020-05-15 operations postgresql outage

We had severeral hours of downtime yesterday because of a “fun” PostgreSQL problem (see Beware of your next glibc upgrade for details). Short version: because there are lots of ways to encode text, and they can change subtly when you upgrade your OS, some of your unique database keys may end up being not so unique. This resulted in quite a few Mastodon tables having duplicate rows, despite having unique indexes. It also meant that PostgreSQL was all like “indexes? lol never heard of ’em” and query performance was pretty awful.

Unexpected downtime

2018-05-04 outage

Who accidentally ran sudo reboot because it was in their shell history? I would never do a thing like that. ahem Pesky gremlins.

How we backup

2018-03-16 devops outage s3

I woke up to the terrible news that our good friends on another instance had lost their database during a software upgrade. Godspeed and good luck in bringing it back online. We’re pulling for you!

The Free Radical site backs itself up hourly to a private S3 bucket, and keeps a month’s worth of these snapshots. It’s configured to upload all media files to S3 and serve them from there. In the event of a complete server failure, I could – assuming all goes well – re-deploy the software on a new server and restore from backup without losing more than just users and posts created since the last hour’s backup.

Unexpected outage

2017-07-06 outage

The server was down from about July 5, 2017 11:30AM PDT to about July 6 8:15AM PDT. I rushed off an apt-get update and didn’t check its results before rushing off to something else. Sorry for any inconvenience!

I’ve set up monitoring with Uptime Robot to notify me about any future outages so you won’t be sitting in the dark.