
                           Maildrop tips and tricks
                                       
   Last update: 10/11/1998.
   ______________________________________________________________________
   
   The include statement
   The $LINES and $SIZE environment
   Partial matches using the ! operator
   Short-circuit evaluation using logical operators
   The foreach statement
   The $HOME/.tmp directory (10/11/98)
   ______________________________________________________________________
   
   Here's a list of tips and tricks to write efficient mail filters using
   maildrop. Maildrop - just like any programming language - can do
   certain things better than others.
   
   
The include statement

   When maildrop is started, it reads $HOME/.mailfilter, or some other
   file containing mail filtering instructions. Before doing anything
   else, this entire file is read and semi-compiled. Any grammatical or
   syntax errors will result in maildrop terminating with a temporary
   error code, without doing anything with the message.
   
   The entire file must be 'semi-compiled". The 'semi-compilation'
   process consists of building an logical representation of the
   filtering statements in memory.
   
   If you have a long and complicated set of instructions, it will be
   semi-compiled whether or not it will actually be executed. For
   example, consider the following code:
   
if ( /^Subject: rosebud/ )
{
    ...
    some
    really
    long
    set
    of
    instructions
    ...
}

   If the contents of the "if" construct are large, maildrop may spend
   considerable amount of time and memory semi-compiling filtering
   instructions that may never be used. Consider putting all these
   instructions into a separate file, and using the include statement to
   execute them.
   
   The include statement is processed only when it is actually executed
   by maildrop. That carries certain side effects. One feature of
   maildrop is the semi-compilation which weeds out errors in the
   filtering recipe file, holding all mail and not doing anything until
   the errors are fixed. The include statement is not processed until it
   is actually executed. If there are errors in the file referenced by
   the include statement, maildrop will not know about them until it
   executes the include statement. When maildrop detects the error, it
   will terminate with a temporary error code, holding all mail. However,
   any filtering instructions before the include statement would've
   already been executed.
   
The $LINES and $SIZE environment variables.

   These variables are initialized to the number of lines in the message,
   and the size of the message in bytes. They may not be exact, due to
   the presence of the From_ line, or -A arguments on the command line,
   but they will always be a reasonable approximation.
   
   Scanning headers for patterns will take the same amount of time
   whether the body of the message has 100 or 1,000 lines. However, the
   "b" option causes the message body to be searched. If someone sends
   you a 5 megabyte binary file, each pattern matched against the message
   body will take a significant time to complete.
   
   Whenever possible, the first thing you should do is check both
   variables to see if the message  is excessively large. If so, divert
   the message to a separate mailbox, or reject it.
   
   Weighted scoring is especially sensitive to the size of the message
   being delivered. Normally, as soon as a pattern match is found,
   maildrop terminates the pattern scan. If weighted scoring is used then
   the pattern scan is immediately restarted, until the end of the
   message is reached.
   
   A fixed overhead is required to start a pattern scan going, so that
   weighted scoring that matches a very short pattern, like
   /[:lower:]/:Dbw,1,1 will take a significant amount of CPU time to
   complete. Maildrop is simply not optimized for these kinds of
   situations.
   
   Nevertheless, on a Pentium 200, /[:lower:]/:Dbw,1,1 perform reasonably
   well for messages less than 60K long.
   
   Here's an example of using $LINES and $SIZE to divert large messages
   to a separate mailbox:

    if ($LINES > 1000 || $SIZE > 60000)
    {
        to "mail/IN.large"
    }

    .
    .
    .

Partial matches using the ! operator

   Maildrop implements the ! operator in patterns by breaking up the
   pattern into two separate patterns, which are compiled and executed
   separately.
   
   Whenever the first pattern is succesfully matched, maildrop runs the
   second pattern. If the second pattern does not match, maildrop resumes
   matching the first pattern until it matches again.
   
   Since starting a pattern match involves some fixed overhead,
   minimizing the number of times the first pattern will match will
   reduce the amount of time it takes to match the entire pattern.
   
   For example: you're trying to extract an IP address from the first
   Received header. Here's your first attempt at doing so:

    /^Received:.*![0-9\.]+/
    IP=$MATCH2

   This will not work at all. From the manual page for maildropfilter:

       When  there  is  more  than  one  way  to  match a string,
       maildrop favors matching as much  as  possible  initially.

   So this fragment will set IP to the last digit, or period found in the
   first Received: header, because the first half of the pattern will
   match as much as possible.
   
   Here's your second attempt, then:

    /^Received:.*[^0-9\.]![0-9\.]+/
    IP=$MATCH2

   This will work, except the first half of the pattern will end up
   matching every character in the Received: header that's not a digit or
   a period. Maildrop will then stop and try to run the second half of
   the pattern.
   
   Your mail server will probably put a left bracket before the IP
   address of the relay. Noting that, the following code picks up the IP
   address as efficiently as possible:

    /^Received:.*\[![0-9\.]+/
    IP=$MATCH2

Short-circuit evaluation using logical operators

   The logical operators || and && work in maildrop just like they work
   in C. Consider the following expression:

    if (/ ... some pattern ... / || / ... some other pattern ... /)
    {
        .
        .
        .
    }

   If the first pattern is found, maildrop will not execute the second
   pattern, since the logical expression will be true no matter what. You
   can use that to your advantage. If you have two patterns, and one of
   them takes more time to process, put the other first. Even if both
   patterns are relatively quick, if you expect one of them to be found
   almost every time, put that one first so that it will not be necessary
   to run the second pattern most of the time.
   
   The same concept works for the && operator:

    if (/ ... some pattern ... / && / ... some other pattern ... /)
    {
        .
        .
        .
    }

   If the first pattern is not found, maildrop will not execute the
   second pattern, since the logical expression will be false no matter
   what.
   
The foreach statement

   The foreach statement is implemented as follows. Maildrop will execute
   the regular expression, and compile a list of all the patterns in the
   message matched by the regular expression. Afterwards, maildrop will
   execute the foreach statement, once for each matched pattern.
   
   It is important to note that maildrop will create a list in memory
   containing every pattern matched by the regular expression. if the
   regular expression is found many times in the message, this will
   require a lot of memory. For example, the following is a Very Bad
   Thing[tm]:

    foreach /[:alpha:]/
    {
        .
        .
        .
    }

   Maildrop does not include any built-in limits on resource usage,
   except for the watchdog timer. Therefore, as a system administrator,
   it is your responsibility to limit resources used by the maildrop
   process.
   
   This implementation of the foreach statement is the one that yields
   the least number of surprises. Observe that the contents of the
   foreach construct can modify the message using the xfilter statement.
   Furthermore, the regular expression may include variable substitution,
   and those variables could be modified by the foreach statement. If the
   foreach statement were to be executed "on the fly", these factors
   would yield unpredictable - and rather messy - results.
   
The $HOME/.tmp directory

   Under certain conditions, maildrop will need to use a temporary file.
   Temporary files are created in the $HOME/.tmp directory. Temporary
   files are used to store large messages (instead of storing them in
   memory). Temporary files may also be used by the xfilter command, or
   in other situations.
   
   Maildrop will automatically delete all temporary files before
   terminating. However, if the maildrop process is killed, or terminates
   abnormally, a temporary file may remain and take up disk space. To
   have those leftover files automatically cleaned up, put the following
   code in /etc/profile, or some other file that all users run after
   logging in:

    find $HOME/.tmp -depth -mtime +7 \( \( \
        -type d -exec rmdir {} \; \) -o -exec rm -f {} \; \) 2>/dev/null
