# *********************************************************************
# jointable.txt: help text
#
# Copyright (c) 2001,2006 Carlo Strozzi
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; version 2 dated June, 1991.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
#
# *********************************************************************
# $Id: jointable.txt,v 1.8 2006/03/10 11:26:13 carlo Exp $

                      NoSQL operator: jointable

Joins two NoSQL tables on one or more common fields.

Usage: jointable [options] [table_1] table_2

Options:
    --column (-j) 'column'
      Join on 'column' from both files. 'column' can
      be a list of comma-separated column names, that
      is: 'column1,column2,...'.

    --help (-h)
      Display this help text.

    --all (-a)
      In addition to normal output, produce a line for each
      unpairable line in table_1 (so called 'outer join').

    --ignore-case (-i)
      Ignore differences in case when  comparing fields.

    --numeric (-n)
      Input files are sorted numerically.

    --debug (-x)
      Display the join(1) command to STDERR.

Notes:

Unless option '-n' is specified, the two tables must be sorted
alphabetically on the join fields for the operation to function
correctly.

Either one or the other, but not both, of the two input tables can
be specified as '-', meaning STDIN. See join(1) for more details
on the meaning of each option. 

In 'jointable' terms, 'table_1' is the primary join table while
'table_2' is the secondary one. If only one table is specified on
the command-line and it is different from "-", it will be used as
the secondary join table, while the primary one will default to STDIN.
This default behaviour is consistent with what represents the most
common use of 'jointable.

The join may occur only between equally-named fields from the two
tables. If the requested join column(s) does not exist in either table
the program will exit with an error. There is a good reason for this
restriction: information should be consistent across the whole database.
Joining makes sense only between a primary key from one table and a
foreign key from another table. The two fields are actually the very
same piece of information, so they should always be given the same
name, and that name should be unique. Enforcing this concept leads to
consistent database design.

If for some reason the two tables contain equally-named fields, the one
on STDIN will have to be piped through 'rename -f' before being fed into
'jointable'.

If no join column is specified on the command-line, then 'jointable'
will try and infer the key column(s) from the one of the two tables
which is not on STDIN. The file-name is expected to be structured like
this:

           somename.k.keycol1[.keycol2 ...][-suffix]

where 'keycol1' is one key column, 'keycol2' and possibly others are
the other key columns, where applicable, and "-suffix" in an optional
trailer that is always removed from the file name for the computation.
If the key column(s) cannot be derived from the file-name, the join
column will default to the leftmost field in both tables.

Whatever the join column(s), it will have to exist in both tables
(although possibly not in the same column positions) or 'jointable'
will complain.

If either of the two input tables contain duplicated columns, i.e.
columns with the same name but possibly different values, only the
first (leftmost) one is taken into account. The output will still have
the duplicates, but this time with equal column values.

If multiple join columns are used, they must be listed in the same order
in which they occur in the input data. That is, join columns must occur
in the same order in both tables, although not necessarily at the very
same column positions.

Warning: sometimes, when 'jointable' output is redirected to a file, the
latter is not created right away, and an immediately following shell
command that tries to read the file will fail. This happens only when
the second command is run within a few milliseconds after 'jointable'
has exited, that is in shell scripts. This is presumably due to timing
issues related to the background 'join' process spawned by 'jointable'.
Apparently it happens only with stdout redirection, not with pipelining,
so an easy work-around is the following:

		jointable table1 table2 | cat > file

Maybe not very elegant a solution, but it works :-) Fortunately, in
most practical cases this is hardly an issue, as jointable's output
needs to be piped through some other filters anyway.

The following names are reserved to the awk language, and should not
be used to indicate column names:

BEGIN, END, break, continue, else, exit, exp, for, getline, if, in,
index, int, length, log, next, print, printf, split, sprintf, sqrt,
substr, while, and possibly others. See mawk(1) for more on this.

