Summary: Mastodon has few tools for automatedly fighting spam and abuse so I gave up and made one.

We’ve been swamped with a flood of spam for the last few days. Some loser is creating hundreds or thousands of accounts on undermoderated servers and pestering the whole fediverse with junk. Mastodon itself provides no mechanism for admins to reject statuses that contain certain strings, even though many people have begged for this over the years. And while I could learn enough Ruby on Rails to implement such a feature myself, I’m not confident that it would be accepted into the main project and I don’t want to maintain a fork.

What I do have is root-level access to my instance’s database, enough SQL knowledge to be dangerous, and a willingness to break things and see what happens. I put all that in a blender and some working code came out the other side.

What you’re looking at is a PostgreSQL check constraint that applies a function I wrote to every status insert into the database, and rejects ones that contain text I don’t ever want to store on my instance. If I try to post a toot that contains such text, I get a little “500” popup in the corner of my screen and it doesn’t get sent. I’m not sure what happens if another server tries to send us a toot with that text. I’m guessing the API returns a 500, too, and it fills up their outbound queue with retries. I honestly couldn’t care less. Don’t send us spam, yo.

Before you apply this on your own server, read the giant warning at the top. If you don’t, and you mess around with this without following the advice, you’re going to be a very sad camper next time you try to restore your database. Don’t panic, though. This uses normal, built-in PostgreSQL features in the normal, not-“clever” way they’re meant to be used. The risk isn’t to this database check specifically, but to all PostgreSQL check constraints that call user-defined functions. Like so many other database features, it’s something to learn, understand, and respect.

A little nervous? Good. You’re about to tell your database to reject otherwise valid data that the server tries to insert into it. Don’t do that unless you’re comfortable with the idea. Sure would be nice if there were an approved way to filter toots so we didn’t have to resort to drastic measures, but if you’ve read along this far, you’re probably as annoyed by all this as I am and ready to cry havoc and loose the hounds of YOLO.

And with that, let’s YOLO.

-- WARNING! WARNING! READ ME! I'M NOT KIDDING! This is a giant footgun that
-- could easily make it so that you can't restore your database backups _if you
-- ignore this advice_.
--
-- When you add a PostgreSQL constraint, that constraint is evaluated when you
-- add it and then again when you change a row, e.g. by inserting or updating
-- data. That's the only time. If the constraint calls a user-defined function
-- like ours does here, and you change that function's definition, then
-- PostgreSQL won't evaluate any existing rows again. Why's that a problem? Say
-- you modify the 'frz_text_is_abusive' function in-place without dropping the
-- constraint first. Now it's possible that there will already be existing
-- statuses rows that violate the new, updated constraint. Nothing will happen.
-- PostgreSQL won't re-check them. Now if you backup the database and try to
-- restore it, the constraint *will* be evaluated on each row as it gets
-- inserted, and it will fail when it gets to one of the demon rows, and you
-- will have a sad.
--
-- Don't do that. Any time you want to edit this function, drop the constraint,
-- make your edits, and re-add the constraint. If you have bad data, PostgreSQL
-- will complain about it then and won't let you add it until you delete the
-- abusive statuses. Yes, it's slower that way. Yes, you will be so happy when
-- your next database backup/restore succeeds.

\timing on

BEGIN;

-- Did I mention that you should always drop a constraint before modifying a
-- user-defined function that it references? I think I did! If you don't believe
-- me, go back and read that giant warning paragraph above that you skipped
-- over.
ALTER TABLE statuses DROP CONSTRAINT IF EXISTS frz_2024_10_10_01;

-- Create a function where you pass in a string and it tells you if that string
-- contains abusive text.
DROP FUNCTION IF EXISTS frz_text_is_abusive;
CREATE FUNCTION frz_text_is_abusive (status_text TEXT) RETURNS BOOLEAN AS $$
DECLARE
    lower_status_text TEXT = LOWER(status_text);
    abuse_text TEXT;
    -- This is the collection of strings that we do not wish to allow to exist
    -- in our statuses. BE CONSERVATIVE. If you add the spam string 'e',
    -- you're gonna have a really bad time. For speed and robustness, we
    -- compare these to the lowercase value of the passed-in status text. Be
    -- sure the strings you add here are lowercase, then, or they'll never
    -- match anything.
    abuse_text_strings TEXT ARRAY = ARRAY[
        '<a href="https://midokuriserver.github.io/minidon/'
    ];
BEGIN
    -- This algorithm isn't super fast and it only gets slower as we add more
    -- abuse strings. However, "slow" here is relative. At the moment I'm
    -- writing this, I have about 10M statuses in my database. A query to check
    -- every row takes about 1 minute, so it runs about 160,000 checks per
    -- second. If your instance receives more than 160,000 new statuses per
    -- second, contact me for information about my hourly consulting rates.
    -- You can afford it.
    FOREACH abuse_text IN ARRAY abuse_text_strings LOOP
        IF POSITION(abuse_text IN lower_status_text) > 0 THEN
            RETURN true;
        END IF;
    END LOOP;
    RETURN false;
END;
$$ LANGUAGE plpgsql;

COMMIT;

-- See if our queries work.
SELECT * FROM frz_text_is_abusive('hello, world!'); -- f
SELECT * FROM frz_text_is_abusive(
    'i am annoyed by <a href="https://midokuriserver.github.io/minidon/'
); -- t

-- See if any current rows are abusive. If so, this gives you a list of status
-- IDs to investigate. You'll need to delete all of them before you can apply
-- the constraint in the next step.
SELECT id FROM STATUSES WHERE frz_text_is_abusive(text);

BEGIN;
-- Now tell PostgreSQL never to insert values where that function returns
-- true.
ALTER TABLE statuses
ADD CONSTRAINT frz_2024_10_10_01 CHECK (NOT frz_text_is_abusive(text));
COMMIT;

\timing off

I’m not going to tell you how to run this code in your own database. If you’re not certain how to do that, you probably should not be making this change. I don’t mean to be an elitist jerk about it, I promise! It’s just that I don’t want to help anyone shoot themselves in the foot.