README for 'urlredir'
While this software is free, it is subject to the GNU General Public Licence,
see the copyright section at the bottom of this file.  I know I probably 
didn't need to worry about this, but I figure "what the hell"(tm) :)

Some parts of this program were extracted from other sources, such as the
search algorithm from "algorithms in C" by Sedgwick.  As far as we can tell
copyright has been observed in all cases.



INSTALLATION


1.  Once you have unpacked the tar file, configure the package for your
system by running './configure'.  If you're using `csh' on an old version 
of System V, you might need to type `sh configure' instead to prevent 
`csh' from trying to execute `configure' itself.

The `configure' shell script attempts to guess correct values for
various system-dependent variables used during compilation, and
creates the Makefile(s) (one in each subdirectory of the source
directory).  In some packages it creates a C header file containing
system-dependent definitions.  It also creates a file `config.status'
that you can run in the future to recreate the current configuration.

Running `configure' takes a minute or two.  While it is running, it
prints some messages that tell what it is doing.  If you don't want to
see the messages, run `configure' with its standard output redirected
to `/dev/null'; for example, `./configure >/dev/null'.

Once you have configured the package, you can simply type 'make install' to
install the program, config file and man pages.  By default the binary
is installed in /usr/local/bin, and the config file in /usr/local/etc.
You can specify an installation other than /usr/local by giving 'configure'
the option '--prefix=PATH'.  Alternately, you can do so by consistently 
giving a value for the 'prefix' variable when you run 'make', e.g.,
   make prefix=/usr
   make prefix=/usr install

You can also specify independent locations for the binary file and the
config file, by giving 'configure' the options '--bindir=PATH' and/or
'--sysconfdir=PATH'.  If either is unspecified then the regular prefix is
used.  The location for the manpages can be given by '--mandir=PATH'.

In addition, the following options are available with configure:

  --enable-debug          enable debugging code 
                          (adds -DDEBUG to compile command line)
  --disable-regex         disable regular expression usage, use Boyer Moore
                          string searches instead.
                          (adds -DNOUSE_REGEX to compile command line)
  --disable-saveskip      disable saving the skip tables when using Boyer
                          Moore string searching.
                          (adds -DNOSAVESKIP to compile command line)


If your system requires unusual options for compilation or linking
that `configure' doesn't know about, you can give `configure' initial
values for some variables by setting them in the environment.  In
Bourne-compatible shells, you can do that on the command line like
this:
   CC='gcc -traditional' DEFS=-D_POSIX_SOURCE ./configure

The `make' variables that you might want to override with environment
variables when running `configure' are:
(For these variables, any value given in the environment overrides the
value that `configure' would choose:)
CC          C compiler program.
            Default is `cc', or `gcc' if `gcc' is in your PATH.
INSTALL     Program to use to install files.
            Default is `install' if you have it, `cp' otherwise.

(For these variables, any value given in the environment is added to
the value that `configure' chooses:)
DEFS        Configuration options, in the form `-Dfoo -Dbar ...'
            Do not use this variable in packages that create a
            configuration header file.
LIBS        Libraries to link with, in the form `-lfoo -lbar ...'



2.  Type `make' to compile the package.  If you want, you can override
the `make' variables CFLAGS and LDFLAGS like this:

   make CFLAGS=-O2 LDFLAGS=-s


3.  Type `make install' to install programs, data files and manpages.


4.  You can remove the program binaries and object files from the source 
directory by typing `make clean'.  To also remove the Makefile and all the 
files that `configure' created, type `make distclean'.


The file `configure.in' is used as a template to create `configure' by
a program called `autoconf'.  You will only need it if you want to
regenerate `configure' using a newer version of `autoconf'.



USAGE


As for actually using this program?  It is usually used as an addition to the
squid proxy server.  As such it reads a string containing the url from stdin,
and returns either a newline if the url is unmatched, or the new url if it was
redirected.  The redirection is controlled by the config file, explained below.

The format of the input string (as given by squid) is:

     <url> <source address> <other stuff....>



CONFIGURATION

This is the important section. 

The redirector uses a configuration file (normally /etc/urlredir.conf) to 
control the way it redirects proxy requests.  The configuration file is
structured in a c-style format, with command lines starting subgroups (or 
functions - determined by enclosing braces {}), or terminated by ";"'s.
In the config file, there are two different command types:  search commands,
and action commands.

First I'll define the search commands.  These are:

      contains <key>... {
      hostcontains <key>... {
      pathcontains <key>... {
      exempt <key>... ;
      hostexempt <key>... ;
      pathexempt <key>... ;


The "contains" commands search the url's for any of the keys (more than 
one can be specified per line, note the ... notation), and then enter the
subgroup (note the opening '{') if found.  The "exempt" commands also 
search for any of the keys, and if found exempt the url from any further
searching.

The prefixes "host" and "path" imply that only the host or path of the url
should be searched for the keys.


-- This section from versions pre 1.3, or when compiled with -DNOUSE_REGEX

A note on keys:  At the moment the only special characters that these 
support are a ^ at the start, of a $ at the end, which implies matching only
to the start or end of the search space respectively. 

-- End pre 1.3


As of urlredir version 1.3, the keys use case insensitive extended regular 
expressions, as used in egrep (see the regex(7) man page).

Also, if the command "file" is given instead of a key, then the following
key is interpreted as a file, from which keys are extracted, one per line.

      eg.   contains file "/var/lib/bann/banned.list" {


Now, onto the actions.  The current actions supported are as follows:

      redirect <url>... ;
		browser redirect <url>...;
      logfile filename;
      randomness 0..100;


All these commands only act upon a subgroup, except for logfile, which not
only acts on a subgroup, but is inherited into all child subgroups (unless
a new logfile is defined).

We'll look at redirect first.  This specifies url's to redirect the request
to.  This is used inside a "contains" group, to redirect the request when there
is a matching key.  If more than one url is provided, a random one is chosen.
There can also be multiple redirect lines in a group, with the same effect
as a single line containing all the url's.

Browser redirect is a simple enhancement on redirect, that sends squid
"301:<url>" instead, causing it to redirect the browser to that page.

The logfile entry specifies the file to log to when a redirect occurs.  A log 
entry of the following format is made:

      <date & time> <source of request> <url requested>

As mentioned above, the active logfile is inherited into any child
subgroups, unless the logfile is set within the child.  This means that
you could set a logfile at the top of the config file, that would be used
for all redirect actions.  This is possible as the entire config file is
considered as a "contains" class, with a key that always matches.

Lastly the randomness command specifies the percentage likelihood that a
particular contains command applies, given that there is a matching key. 
This is useful for redirecting only part of the time.  This is 100% by default.
If a command is not applied, then the url processing continues as if their had
not been a key match at all.

Also, see the directory 'examples; in the source directory for a further 
example.



BUGS

This program was put together rather quickly, and as such possibly has a 
few bugs.  If you find these, or would like to suggest/supply additional
features, please mail one of the authors, or Chris Leishman at the email
address masklin@debian.org



COPYRIGHT

This utility has been put together by Chris Leishman and Trevor Cohn for the
Ormond College IT Department.  It is currently maintained by us.

   This utility is free software: you can redistribute it and/or modify it
   under the terms of the GNU General Public License as published by the 
   Free Software Foundation; version 2 dated June, 1991.

   This program is distributed in the hope that it will be useful, but 
   WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
   for more details.

   You should have received a copy of the GNU General Public License
   along with this program;  if not, write to the Free Software
   Foundation, Inc., 675 Mass Ave., Cambridge, MA 02139, USA.

On Debian GNU/Linux systems, the complete text of the GNU General
Public License can be found in `/usr/doc/copyright/GPL'.


Chris Leishman   <chris@ormond.unimelb.edu.au>
Trevor Cohn      <trev@ormond.unimelb.edu.au>
