Abusing fail2ban to train rspamd

Some time ago I enabled a more verbose logging in dovecot and noticed messages moved between INBOX and Junk. That came as a surprise, because I always thought people deleted junk messages. Turns out, they take the pain to move them in the appropriate folder, believing server will learn from this and take appropriate action in the future. This not the case with most smaller providers. Sites with a typical postfix – dovecot – roundcube setup may handle spam/ham marking when performed via the web interface (roundcube), but mostly fail when performed via an IMAP client like Thunderbird or Outlook.

Unless of course, the spam filter is notified somehow of the user’s action and the message content is fed to the filter’s learning mechanism, which the user thought was happening anyway. This was the problem in this case, and here follows what it took.

A word about the infrastructure. It’s a small site, <1000 users and about 50 domains, though there would be any problem with more users and/or domains. The server is using dovecot and rspamd with the fuzzy_check module enabled and a local fuzzy database, in addition to the configured by default rspamd servers. With small changes the solution can be applied to a bayes module or spamassassin.

The first problem was how to detect when a user drags a message out of the Junk folder and into INBOX, no matter how this happens. Normally this is not logged anywhere, which is a problem, unless a block like the following is added in dovecot’s configuration file (/etc/dovecot/dovecot.conf in this case):

plugin {
    mail_log_events = copy
    mail_log_fields = uid box msgid size
}

After restarting dovecot when a user moves a message to the Trash or, in our case, from Junk to INBOX, a line like the following will be logged:

Nov 12 09:37:23 mailhost dovecot: imap(xxx@nhyui.com)<4708>: copy from Junk: box=INBOX, uid=131400, msgid=<921774d7b58b28e02d4b2bf62.f3e832bb98.20201111125929.6962bf5040.91721499@mail…, size=76041

This is a line that says user xxx@nhyui.com moved a message from the Junk folder to INBOX. The message’s UID is shown as uid=131400. Now what we need to do is scan the log for lines like this, preferably in real time, and act upon them. Here is where the word “abusing” in the title is derived from.

Fail2ban is not designed for this purpose, but chances are it is already installed and running on most internet facing servers and if not, it is available in most, if not all, repositories. And its purpose in life is to follow log files and block offending IP addresses. Since it scans log files, why not trace mailbox actions too? All it needs is a filter to find the ‘dovecot….copy’ lines and a rule to act upon these lines. To get the text of a specific message from dovecot we need to know the owner of the message, the message’s uid and the mailbox where the message resides. Let’s assume she’s dragging a message from Junk to INBOX and go from there (from INBOX to Junk is practically the same with a few minor modifications).

In /etc/fail2ban/filter.d create a file hamfilter.conf and put the following in it:

[INCLUDES]
before = common.conf
[Definition]
_auth_worker = (?:dovecot: )?auth(?:-worker)?
_daemon = (?:dovecot(?:-auth)?|auth)
EMAIL = \w+(\.\w+)?@\w+(-\w+)?\.\w+

failregex = dovecot: imap\(<F-USER><EMAIL></F-USER>\).*: copy from Junk.* box=INBOX, uid=<F-ID>\d+</F-ID>, msgid=.*size=\d+

[Init]
journalmatch = _SYSTEMD_UNIT=dovecot.service

This filter will create 2 variables, F-USER and F-ID. Now in /etc/fail2ban/actions.d create a file named hamaction.conf, and put the following in it:

[Definition]
actionban = doveadm fetch -u <F-USER> text uid <F-ID> mailbox INBOX | rspamc -f 3 -w 15 fuzzy_add
actionunban = /bin/true
[Init]

In this local fuzzy database, 3 is the L_FUZZY_WHITE tag, meaning the good messages.

Lastly, edit /etc/fail2ban/jail.local and add the following lines at the end of the file:

[mark_ham]
enabled = true
maxretry = 1
findtime = 1m
filter = hamfilter
action = hamaction
logpath = /var/log/mail.log

In case you log with systemd’s journal, the logpath should be replaced with the appropriate logtarget name.

[mark_ham]
enabled = true
maxretry = 1
findtime = 1m
filter = hamfilter
backend = systemd
logtarget = dovecot

In this system dovecot logs using the mail facility in /var/log/mail.log, but not all systems may do the same, so adjust accordingly. Now just reload fail2ban and the mark_ham jail will start running. If you drag a message from Junk to the INBOX, rspamd will learn about it almost instantly.

This method can also be tuned to train a Bayes filter, with rspamd keeping each user’s statistics separately.