Posts with the tag Status:

2023-08-03 outage

At 2:53AM Pacific, our normally rock solid Internet connection went down for 10 minutes. When it came back up at 3:03AM, that should’ve been the end of it. It wasn’t.

Short version: I had to reboot the firewall, then everything was fine.

Longer version:

  • Wake up at 7-something.
  • Yawn, stretch, pet the dog, look to see what happened online overnight.
  • Dang it, FRZ’s down.
  • Check email. See an email from a moderator from an hour earlier: hey, the site’s down!
  • Run downstairs. Instantly remember and regret that the coffee pot died yesterday and I have no caffeine, nor will I any time soon. Curse quietly.
  • SSH into FRZ. All looks OK from there, except that pgbouncer can’t connect to the database server.
  • Check the database server; it’s up, running, and twiddling its thumbs in boredom. That’s weird.
  • Use netcat from the FRZ server to verify that I can connect to the DB server. I can, but only with IPv6. IPv4 isn’t working. For ancient reasons. I had pgbouncer pinned to IPv4. Huh.
  • Speculation here:
    • I think that when the outage was over, the firewall found itself bombarded with frantic inbound connections from the FRZ server, and either temporarily blocked them or overloaded some kernel table or such.
    • There’s no easily visible evidence that either of those happened.
      • If it autoblocked the FRZ server – and it shouldn’t have, but here we are talking about it – it didn’t log it or notify about it.
      • If it was because a NAT table filled up or such, I didn’t get an alert on that, either.
  • Outbound IPv4 was just fine. I regret now that I didn’t check other inbound IPv4 ports on the firewall. I blame the lack of caffeine.
  • Lacking anything else to go on, I rebooted the firewall, and ta-da!, we’re back.
  • The Sidekiq queue has about 36,000-and-growing tasks to chew through. It’ll be a little while until we’re 100% back up to speed.

That was weird. I don’t know why that happened, and I don’t like that feeling. And as I write this, I remember that I’d pinned pgbouncer to IPv4 because one time IPv6 stopped working in a very similar way. Maybe the same thing happened then but in reverse?

Upgrading to 4.1.3

We’re updating the site to the new urgent security release v4.1.3. We may be offline or wonky for a few minutes at a time through the promise. Be back soon!

Outage 2023-06-27

An urgent-ish software update has us down for maybe 20 minutes or so.

Look here for status updates

When Free Radical is down, I don’t have a great way to communicate that, because… the site’s down. In the future, I’ll post updates here, tagged with “status”. There are a few ways to access this:

I try not to have the site down for extended periods, but things happen. And when they do, here’s where you can find out about them.