AMINDEX
=======


Amindex was a collection of code that added dump indexing and more
friendly recovery to amanda-2.3.0.  It has since been fully integrated
into amanda-2.3.0.X.  An index of the files in each dump image is
generated as part of the dump process and stored in a database.  A
tool, amrecover, is provided which lets users browse the database.
Once a set of files to recover has been assembled, amrecover
"automates" the restoration of these files.

This file describes how the index files are generated and how amrecover
is used.


Database Format
---------------

The database consists of a directory containing a large number of
files, one for each disk backup.  The file names are formed from the
host name, disk name, date, and dump level.  The files are ASCII text
files containing a list of the leaf nodes of the dump, one per line.
Each entry is the filename relative to the mount point, starting with a
/, e.g., /home/user1/data from the disk mounted on /home would generate
the entry /user1/data.  The index files are stored in compressed format
(eg gzip or compress).


Database Record Generation
--------------------------

This section walks through the operation of the system while backups
are being done, and highlights the changes to amanda-2.3.0.X needed.

First off, a new disk-type option "index" was added.  This required
changes to common-src/conffile.[ch] to parse such an option and add it
to the data structure for options, and changes to server-src/driverio.c
to add "index" to the reconstructed option string passed to clients.
Obviously, a backup index is only generated if this option is added to
a file.

On the client side, sendbackup-common.[ch] and sendbackup-dump.c have
been modified to handle the "index" option.  If this option is set,
sendbackup-dump adds createindex-dump to the chain of programs to be
run, between the dump and the compression (if present).
Createindex-dump is responsible for generating the index file.  It
pipes the dump output through "restore -t" to get the necessary
information.  The file is then compressed and stored in a temporary
file on the client.  The location of this file is configurable, e.g.,
/var/tmp, /usr/tmp, /tmp.

[ Yes, okay, this makes the file system containing the temp file active
during its dump, which is a bad thing, but if you have the DEBUG flags
set, amanda is also actively using files in /tmp. ]

Once all of the dumping operations of amdump have been completed,
amdump calls amgetidx, which determines for which file systems an index
was generated and copies them to the database server (assumed to be the
machine running amdump).  The source for this is in
server-src/amgetidx.c.  In early versions, amgetidx used rsh/rcp to
perform this, but now it uses amandad.


Database Browsing
-----------------

The client is called amrecover and is loosely based on the
functionality of the program "recover" from Backup Copilot.  A user
starts up amrecover.  This requires specifying the index server and the
amanda config name (defaults for both are compiled in as part of the
installation).  Then the user has to specify the name of the host
information is wanted about, the disk name, and (optionally) the disk
mount point.  Finally a date needs to be specified.  Given all this,
the user can then roam around a virtual file system using "ls" and "cd"
much like in a FTP client.  The file system contains all files backed
up on the specified date, or before that date, back to the last level 0
backup.  Only the most recent version of any file is shown.

As the file system is traversed, the user can add and delete files to a
"shopping list", and print the list out.


File Extraction
---------------

When a user has built up a list of files to extract, they can be
extracted by issuing the command "extract" within amrecover.

Files are extracted by the following, for each different tape needed.

As part of the installation, a "tape server" daemon amidxtaped is
installed on one or more designated hosts, which have an attached tape
drive.  This is used to read the tapes.  See the config files for the
options for specifying a default.

Amrecover contacts amidxtaped on the tape server host specifying which
tape device to use, which host and disk files are needed for.  On the
tape server host, amidxtaped exec's amrestore to get the dump image
file off the tape, strips the (amanda) header, and returns the data to
amrecover.

If dumps are stored compressed for the client, then amrecover pipes the
data through the appropriate uncompression routine to uncompress it
before piping it into restore, which then extracts the required files
from the dump image.

Note that a user can only extract files from a host running the same
operating system as he/she is executing amrecover on, since the native
dump/restore tools are used - unless gnutar is used.


Protocol Between amindexd and amrecover
---------------------------------------

The protocol talked between amindexd and amrecover is a simple ASCII
chat protocol based on that used in FTP.  Amrecover sends a 1 line
command, and amindexd replies with a 1 line or multi-line reply.  Each
line of the reply starts with a three digit code, starting with a '5'
if an error occurred.  For 1 line replies, and the last line of a
multi-line reply, the 4th character is a space.  For all but the last
line of a multi-line reply, the 4th character is a '-'.

The commands and replies other than acknowledgments are:

QUIT - finish up and close connection

HOST <host> - set host to host

DISK <disk> - set disk to disk

SCNF <config> - set amanda configuration to config

DATE <date> - set date to date

DHST - return dump history of current disk

OISD <dir> - Opaque is directory? query.  Is the directory <dir>
	     present in the backups of the current disk back to and
	     including the last level 0 dump.

OLSD <dir> - Opaque list directory.  Give all filenames present in
	     <dir> in the backups of the current disk back to and
	     including the last level 0 dump.

ORLD <dir> - Opaque recursive list directory.  Give all filenames 
	     present in <dir> and subdir in the backups of the current
	     disk back to and including the last level 0 dump.

TAPE - return value of tapedev from amanda.conf if set.

DCMP - returns "YES" if dumps for disk are compressed, "NO" if dumps
	     aren't.



INSTALLATION NOTES
------------------


1) Whether or not an index is created for a disk is controlled by a
disk configuration option "index".  So, in amanda.conf you need to
define a disktype with this option, e.g.,

define dumptype comp-user-index {
    comment "Non-root partitions on reasonably fast machines"
    compress client fast
    index yes
    priority medium
}

2) You need to define disks that you want to generate an index for to
be of one of the disktypes you defined which contain the index option.
This cause sendbackup-dump on the client machine to generate an index
file which is stored local to the client, for later recovery by
amgetidx (which is called by amdump).

3) Amanda saves all the index files under a directory specified by
"indexdir" in amanda.conf.  You need to create this directory by hand.
It needs to have read/write permissions set for the user you defined
to run Amanda.

If you are using the "text database" option you may set indexdir and
infofile to be the same directory.

4) The index browser, amrecover, currently gets installed as part of
the client software.  Its location may not be appropriate for your
system and you may need to move it to a more accessible place such as
/usr/local/bin.  See its man page for how to use it.


Note that amindexd, amgetidx, amidxtaped, and amtrmidx all write debug
files on the server in /tmp (unless this feature is disabled in the
source code), which are useful for diagnosing problems.  Amrecover
writes a debug file in /tmp on the machine it is invoked.


PERMISSIONS
-----------

The userid chosen to run the amanda client code must have permission to
run restore since this is used by createindex-dump to generate the
index files.

For a user to be able to restore files from within amrecover, that user
must have permission to run restore.


CHANGES FROM AMINDEX-1.0
------------------------

Get index directory from amanda.conf

Integration into amanda-2.3.0.4.

Rewriting of amgetidx to use amandad instead of using rsh/rcp.



CHANGES FROM AMINDEX-0.3
------------------------

Support for index generation using gnutar

Support for restoring files from within amrecover.

Bug fixes:

	* index/client/amrecover.c (guess_disk): 
	Removed inclusion of mntent.h and use of MAXMNTSTR since this
	was non-portable, as pointed out by Izzy Ergas
	<erga00@nbhd.org>.

	* index/client/display_commands.c (list_directory): 
	Removed point where list_directory() could sleep for ever
	waiting for input that wasn't going to come.

	* index/server/amindexd.c
	  index/client/uscan.l
	Installed patches from Les Gondor <les@trigraph.on.ca> to make
	amrecover handle spaces in file names.

	* server-src/amcontrol.sh: 
	As pointed out by Neal Becker <neal@ctd.comsat.com> there were
	still a few sh-style comments that needed conversion to
	c-style.




CHANGES FROM AMINDEX-0.2
------------------------

	* index/client/Makefile.in
	* index/client/help.c
	* index/client/amrecover.h
	* index/client/uparse.y
	* index/client/uscan.l
	Added a help command.

	* index/client/set_commands.c:
	set_disk() and set_host() now check for empty extract list.
	
	* index/client/extract_list.c:
	* index/client/amrecover.h:
	* index/client/uparse.y:
	* index/client/uscan.l:
	Added clear extract list command.
	
	* index/client/set_commands.c (set_disk): 
	Added code so working directory set to mount point.

	* index/client/extract_list.c:
	If the last item on a tape list is deleted, the tape list
	itself is now deleted from the extract list.

	* index/client/amrecover.c: 
	* index/server/amindex.c:
	If the server started up and found that the index dir doesn't
	exist, then it exited immediately and the client got
	informative message.  Corrected this so it is obvious what is
	wrong to the user, since this is most likely to occur when
	somebody is setting up for the first time and needs all the
	help they can get.

	* server-src/amgetidx.c
	Added patch from Pete Geenhuizen
	(pete@gasbuggy.rockledge.fl.us) so that it works even when
	remote shell is csh.

	* server-src/amcontrol.sh
	* server-src/Makefile.in
	Amcontrol is now parameterized like other scripts and run
	through munge to generate installable version.

	* index/server/amindexd.c (main): 
	Added code to set userid if FORCE_USERID set.

	* index/server/amindexd.c
	Removed #define for full path of grep.  Assumed now to be on
	path.

	* client-src/createindex-dump.c
	* client-src/sendbackup-dump.c
	* man/Makefile.in
	Added patch from Philippe Charnier (charnier@lirmm.fr) so they
	work when things are installed with version numbers.  This was
	also reported by Neal Becker (neal@ctd.comsat.com).  Also patch
	to set installed man page modes and create directory if
	needed.

	* config/options.h-sunos4
	Corrected definition for flex library.

	* server-src/amtrmidx.c
	Added some pclose() commands, used remove() instead of
	system("rm ..").  Problems reported by Pete Geenhuizen
	(pete@gasbuggy.rockledge.fl.us) on a system with small ulimits
	set.

	* index/server/amindexd.[ch]
	* index/server/list_dir.c
	* index/client/amrecover.c
	* index/client/set_commands.c
	* index/client/uparse.y
	Changes developed with the help of Pete Geenhuizen
	(pete@gasbuggy.rockledge.fl.us) to support disks specified by
	logical names.  Also, now debug files generated by amrecover
	include PID so multiple users can use amrecover simultaneously
	and without file deletion permission problems.

	* config/config.h-hpux: 
	* config/config-common.h:
	* server-src/amgetidx.c:
	Changes from Neal Becker re remote shell, making it a
	configuration parameter.

	* config/options.h-sunos4
	Had -Lfl instead of -lfl
	


CHANGES FROM AMINDEX-0.1
------------------------

	* index/client/uscan.l: 
	added support for abbreviated date specs

	* index/client/amrecover.c (guess_disk): 
	guess_disk got disk_path wrong if mount point other than / (as
	subsequently pointed out by Eir Doutreleau <ed@cti.ecp.fr>)
	
	* server-src/amtrmidx:
	Added amtrmidx which removes old index files.

	* index/client:
	Added a pwd command

	* server-src/amgetidx.c (main): 
	Added use of CLIENT_LOGIN username on r commands.  (as pointed
	out by Eric Payan <Eric.Payan@ufrima.imag.fr>)

	* server-src/amgetidx.c:
	Bug: It was copying from all clients irrespective of whether
	the client was configured for indices.  A '}' in the wrong
	place.

	* server-src/amgetidx.c: 
	Removed user configuration section.  Instead include amindexd.h
	to get information.



CHANGES/ADDITIONS TO 2.3.0
--------------------------

common-src/conffile.[ch]

- added "index" as a valid option


server-src/driverio.c

- added code to optionstr() to write "index" into option string


client-src/sendback-dump.c

- added code to generate index if requested.

client-src/indexfilename.[ch]
client-src/createindex-dump.c

- code to generate index.

client-src/Makefile.in

- a new target.  Another file for sendbackup-dump

config/config-common.h

- added def of restore.




KNOWN BUGS
----------

- Empty directories don't get into the listing for a dump (at all dump
levels).

- When amrecover starts up, it tries to guess the disk and mount point
from the current directory of the working system.  This doesn't work
for disks specified by logical names, nor when an automounter is being
used, or a link is in the path.


Alan M. McIvor
11 March 1997
