The Mutt E-Mail Client : Configuration : Spam detection
Previous: Message Scoring
Next: Setting variables

3.24. Spam detection

Usage: spam pattern format
Usage: nospam pattern

Mutt has generalized support for external spam-scoring filters. By defining your spam patterns with the spam and nospam commands, you can limit, search, and sort your mail based on its spam attributes, as determined by the external filter. You also can display the spam attributes in your index display using the %H selector in the $index_format variable. (Tip: try %?H?[%H] ? to display spam tags only when they are defined for a given message.)

Your first step is to define your external filter's spam patterns using the spam command. pattern should be a regular expression that matches a header in a mail message. If any message in the mailbox matches this regular expression, it will receive a ``spam tag'' or ``spam attribute'' (unless it also matches a nospam pattern -- see below.) The appearance of this attribute is entirely up to you, and is governed by the format parameter. format can be any static text, but it also can include back-references from the pattern expression. (A regular expression ``back-reference'' refers to a sub-expression contained within parentheses.) %1 is replaced with the first back-reference in the regex, %2 with the second, etc.

If you're using multiple spam filters, a message can have more than one spam-related header. You can define spam patterns for each filter you use. If a message matches two or more of these patterns, and the $spam_separator variable is set to a string, then the message's spam tag will consist of all the format strings joined together, with the value of $spam_separator separating them.

For example, suppose I use DCC, SpamAssassin, and PureMessage. I might define these spam settings:

spam "X-DCC-.*-Metrics:.*(....)=many"         "90+/DCC-%1"
spam "X-Spam-Status: Yes"                     "90+/SA"
spam "X-PerlMX-Spam: .*Probability=([0-9]+)%" "%1/PM"
set spam_separator=", "

If I then received a message that DCC registered with ``many'' hits under the ``Fuz2'' checksum, and that PureMessage registered with a 97% probability of being spam, that message's spam tag would read 90+/DCC-Fuz2, 97/PM. (The four characters before ``=many'' in a DCC report indicate the checksum used -- in this case, ``Fuz2''.)

If the $spam_separator variable is unset, then each spam pattern match supercedes the previous one. Instead of getting joined format strings, you'll get only the last one to match.

The spam tag is what will be displayed in the index when you use %H in the $index_format variable. It's also the string that the ~H pattern-matching expression matches against for search and limit functions. And it's what sorting by spam attribute will use as a sort key.

That's a pretty complicated example, and most people's actual environments will have only one spam filter. The simpler your configuration, the more effective mutt can be, especially when it comes to sorting.

Generally, when you sort by spam tag, mutt will sort lexically -- that is, by ordering strings alphnumerically. However, if a spam tag begins with a number, mutt will sort numerically first, and lexically only when two numbers are equal in value. (This is like UNIX's sort -n.) A message with no spam attributes at all -- that is, one that didn't match any of your spam patterns -- is sorted at lowest priority. Numbers are sorted next, beginning with 0 and ranging upward. Finally, non-numeric strings are sorted, with ``a'' taking lower priority than ``z''. Clearly, in general, sorting by spam tags is most effective when you can coerce your filter to give you a raw number. But in case you can't, mutt can still do something useful.

The nospam command can be used to write exceptions to spam patterns. If a header pattern matches something in a spam command, but you nonetheless do not want it to receive a spam tag, you can list a more precise pattern under a nospam command.

If the pattern given to nospam is exactly the same as the pattern on an existing spam list entry, the effect will be to remove the entry from the spam list, instead of adding an exception. Likewise, if the pattern for a spam command matches an entry on the nospam list, that nospam entry will be removed. If the pattern for nospam is ``*'', all entries on both lists will be removed. This might be the default action if you use spam and nospam in conjunction with a folder-hook.

You can have as many spam or nospam commands as you like. You can even do your own primitive spam detection within mutt -- for example, if you consider all mail from MAILER-DAEMON to be spam, you can use a spam command like this:

spam "^From: .*MAILER-DAEMON"       "999"


The Mutt E-Mail Client : Configuration : Spam detection
Previous: Message Scoring
Next: Setting variables