Honestly, whoever has an idea for a spam detection measure for Mastodon, and by that I do mean an implementation, get in touch with me, I'll pay for it.
I've been thinking about solutions for the past few days but the more I think about them the more they appear pointless.
Defining an account as suspicious when it has no local followers can be circumvented by just pre-following them, using account age can be circumvented with sleeper accounts, blacklisting URLs does nothing when the spam does not include URLs, checking for duplicate messages sent to different recipients can be circumvented by randomizing parts of the message...
E-mail deals with spam using Bayesian filters or machine learning. The more training data there is, the more accurate the results, a monolith like GMail benefits from this greatly. Mastodon's decentralization means everyone has separate training data, and starts from scratch, which means high inaccuracy. It also means someone spamming a username could potentially lead to any mention of that username be considered spam due to the low overall volume of data, unless you strip usernames
An idea for spam containing links
@Gargron when I was working on online advertising, some of the ads we had in our inventory came from other big platforms, like Tubemogul. To know which brand to invoice for the ad display, we used the clickthrough (the url where the user is sent on click on the ad) to determine the domain of the url, after all the redirections.
You can use a similar system to list which domains are often shared in toot and blacklist them if the number of toot containing it is increasing at a specific time. Then, instance admins would receive a notification and could whitelist them if they want / if it's not spam.
FR : Ceci est une instance queer, qui vise à être aussi confortable et safe que possible. Nouvelleaux élèves bienvenu'es !