Since apparently February, my inbox has been under siege by various spam messages promising free stuff from recognizable brands, except that it’s D1SGUI5ED for old-school spam filter evasion, and the domain names are always alphabet-soup randomness.
Part of the problem: apparently some fool (past me) had set CRON=0 in /etc/default/spamassassin, and also
deactivated spamassassin-maintenance.timer, which means the server hadn’t fetched new rules for
SpamAssassin in an extremely long time.
Restoring the timer did not help very much… because the other part of the problem is that Bayes
auto-learning is on by default. Amavis feeds emails that result in a pass to SpamAssassin to learn as
“ham,” so a spammer who can slip by a few rules can have more luck with their later deliveries.
As a result, spam filter performance had degraded to a <50% block rate, and I was dealing with an overwhelming number of messages. I reset the Bayes data, moved ~250 emails to my Junk folder, and trained my archives (ham) and Junk (spam.) Following that, the block rate has been >89%, and the false negatives were sent to Junk for training.
In my particular setup (Postfix smtpd → amavisd-new → Postfix for local delivery), the SpamAssassin processing
happens under the amavis system user. Hence, all the sa-learn commands must be run as that user, and the
messages must be accessible to it.
$ cd "$(mktemp -d)"
$ sudo find ~/.maildir/.Junk/cur \
-maxdepth 1 -type f \
-exec cp -t . '{}' +
$ sudo chgrp -R amavis .
$ chmod 750 .
$ chmod 440 *
$ sudo -u amavis sa-learn --spam .
Some last, unorganized notes: after a reset, the filter only starts working again when 200+ messages of each
type have been learned. The message IDs are remembered, so mistakes can be corrected by re-sending the same
message; this is how training misclassifications can overcome auto-learning. And finally, I showed training
for spam above; training for ham is basically the same process, except changing the source folder and
using the --ham flag instead.
This has never been such a problem in the past, because campaigns that succeeded in reaching my inbox kept reusing fixed domain names, which I would configure to accept-but-drop in Postfix ingress. This kept the messages out of Amavis entirely, and avoided signaling a rejection to spammers. Unfortunately, the randomized names defeated this old method.
No comments:
Post a Comment