
		The arch Revision Control System News


2002-04-20

		   ********************************
		  Upgrading to this release as soon
		 as possible is strongly recommended
		   ********************************


   This is a bug-fix release.  Most noticably, it fixes the problems
   some people have had getting the latest revision of `shell-utils'
   from the regexps.com archive.
   
   The painful part is this:  the bug that was fixed _can_ cause
   project tree or even archive corruption.
   
   The good news is this: the conditions leading to project tree
   or archive corruption are quite specific and hopefully somewhat
   unusual.  I think there is a good chance that nobody has actually
   been bitten by this bug (other than having troubles fetching revisions
   from regexps.com).   The repository at regexps.com, by the way, 
   was _not_ corrupted by this bug.
   
   If you have a revision library for the regexps.com archive, just to
   play it safe, I recommend _deleting_ the `shell-utils' part of that
   library and rebuilding it.
   
   _If_ you have a local branch of `shell-utils', and you have committed
   your own revisions to that local branch, please write to me directly
   -- so that we can take steps to make sure your archive is ok.
   
   Here are a few questions to help you decide if your own
   arch-controlled projects are at risk.  Again, I think it is 
   not likely that many, if any, people were actually effected by 
   this bug.


	1) Do you have any projects that use the "names" tagging
	   method?

		No: You have not been bitten by the bug.

		Yes: Continue to question 2.


	2) For your project that uses the "names" tagging method:

	   Do _all_ revisions of your project have the file:

		{arch}/=tagging-method

	   Note that if, before the inital `import', you set the
	   tagging method with `larch tagging-method', and you
	   didn't subsequently delete the "=tagging-method" file by
	   hand, then all revisions of your project will have that 
	   file.

	   So, again, do all revision of your "names" tagging method
	   project have an "{arch}/=tagging-method" file?

		Yes: You have not been bitten by the bug.

		No: Continue on to question 3.


	3) Do you _also_ have _other_ project trees that use the
	   "implicit" or "explicit" tagging method?


		No: You have not been bitten by the bug.

		Yes: Continue on to question 4.

	4) Have you _ever_ checked out your "names" method tree as a
	   _subdirectory_ (or nested subdirectory) of one of your
	   "implicit" or "explicit" method projects?

	   *OR*, have you _ever_ committed revisions of your "names"
	   method tree while it was a subdirectory (or nested
	   subdirectory) of one of your "implicit" or "explicit"
	   method trees?

		No: You have not been bitten by the bug.

		Yes: Ok, your repository and project trees are at
		     risk.


	If, after reading all of the above questions, you think your
	project trees or archive is at risk, PLEASE WRITE TO ME
	DIRECTLY, AT ONCE.  In the subject of your message, please
	put the string "ARCHIVE IN DANGER" and if I don't reply within
	two days, please write again to be sure I didn't miss your mail.

	If you aren't sure, and just want to play it safe, please feel
	free to write to me.




2002-04-28:
arch-1.0pre14

  Simple bug fix release.


2002-03-20:

* Many bug fixes.  Of these the most notable are:

	`finish-branch' now creates a category, branch, and version as
	needed.

	`star-merge' now works in the simple case of the first merge,
	though there is still a restriction that the working directory
	must be checked out from the branch tree, not the
	branched-from tree.  Previously this case only worked after at
	least one merge was already done using some other command.  In
	a future release, the working directory restriction will be
	removed.


	`merge-points' is now notably faster and gives better results
	even for versions that are a mixture of continuation revisions
	and regular revisions.  In particular, `merge-points' now
	includes the branched-from revision of continuation revisions
	in its output.  In a future release, it will additionally
	include all revisions merged into the branched-from version
	(and so on, recursively).



2002-02-28:

* The naming conventions used for file inventories are now
  configurable on a per-tree basis.  This is documented
  in the user manual chapter "arch Project Inventories".

* There is a new program, `inventory' which is roughly the same
  as the arch command `inventory', but slightly more general.
  `arch inventory' now uses this program to do most of its work
  and consequently, for several important uses, is noticably faster.

* Numerous minor bug fixes -- too many to list here.



2002-02-19:

* A set of changes to support "mawk".

  If "awk" is "mawk", certain Posix regexps that were used by
  arch aren't handled, causing commands to fail.  These have
  been replaced with regexps that all awks should understand.
  (Patch from Jan Harkes.)

* LC_ (locale) environment variables are now set more carefully.
  (Reported by Federico Di Gregorio.)

* A new command, `make-sync-tree' has been added.
  See "http://www.regexps.com/src/docs.d/arch/html/sync.html".

* The "Standard-date:" field in log messages is now GMT
  and includes the time-of-day.
  (Suggested by Daniele Nicolodi.)

* Remaining (reported) Solaris portability problems fixed.
  (Patches from Jonathan Geisler.)

* CDPATH is unset in `larch'

  If CDPATH is set, with some shells, `cd' produces unwanted
  output.  Therefore, `larch' unsets CDPATH. (Reported by 
  John Ellson, diagnosed by Lele Gaifax.)

* Various minor bug fixes.  Among these are the configuration
  system changes requested by people who have tried building
  on cygwin, though I'm certain there's much more to be done
  before arch actually works on cygwin.




2002-02-14:

* Naming Conventions Tweaked

  Source files can now begin with "_" and must not end with ".a" or
  ".o".  The names "CVS.adm", "SCCS", and "RCSLOG" have been added to
  the list of "not a source file".  (reported by several people).

  Someone asked to also exclude some compiler intermediate
  files, such as "y.tab.c" -- I've not done that because it
  is common practice to distribute such files with programs
  to make bootstrapping easier.


* user id syntax liberalized

  "_" is now permitted in the unique id part of a user id.  This is
  really a partial fix -- `valid-id' should agree precisely with
  various standards on what is a legal email address and domain
  name, but the more extensive fix is being postponed until a 
  more complete review of all of the naming convention functions
  takes place.  (Reported by don_dayley.)


* valid-log-file error message clarified
  
   In response to user confusion, `valid-log-file' now reports
   errors like:

	missing (or empty) "Summary:" header

   instead of just


	missing "Summary:" header

   (Reported by several people.)


* bugs in the hackerlab "tests/arrays-tests" and "tests/fs-tests" have
  been fixed (Reported and patched by Matthias Neeracher).

* an awk syntax error in `log-header-field' has been fixed
  (Reported and patch by Jan Harkes.)

* You can now use an alternative shell for configuration

  "config.status" files now start with "#!" lines and the 
  "src/build-tools/scripts" configuration scripts use an
  explicitly chosen shell to call each other.  The top-level
  configure script now accepts a "--config-shell SHELL" option.

  This is a small step towards cygwin portability, and eases
  some of the portability constraints on "configure" for systems
  where "/bin/sh" is not quite posix.

  (Reported by Jason Diamond and others.)

* added a --pull option to push-mirror

  The new option speeds up the case of pushing from a remote archive
  to a local mirror.

* avoid dirent portability problems

  Some portability problems in libhackerlab's use of dirent have been
  fixed.  (Reported by several people.)

* `star-merge --finish' was broken.
  Now it isn't.


2002-02-09:

Reports of egregious portability problems seem to be dying off and
reports of little-tested or untested uses of various commands are
picking up a little bit.  That's a good sign.  

Here's what's new:


* arch: command not found	-or-	i686

  The program "arch" is now installed as "larch" to avoid a naming
  conflict that several people have complained about.  (The project
  itself is still named "arch".)

  Here is the list of programs installed in "$prefix/bin" by 
  an arch distribution.  I think there are no more naming conflicts,
  but if I missed one, let me know:

	as-daemon
		run a program in background

	copy-file-list
		copy selected files and directories from a tree

	dangerous-rename
		invoke rename(2) directly without trying to be
		clever about it like mv(1)

	file-metadata
		report data such as file permissions.

	file-tag
		report a file's inventory tag

	ftp-push
		a shell script to update an ftp mirror

	arch--release-arch--1.0--*-cfg
	arch--release-arch-cfg
	arch-cfg
		configuration information reporters

	larch
		front-end program to the arch revision control
		system

	need-args
		an xargs helper

	read-link
		print the target of a symbolic link

	set-file-metadata
		set data such as file permissions

	unfold
		split words into multiple lines

	wftp-cd
	wftp-delete
	wftp-get
	wftp-home
	wftp-ls
	wftp-mkdir
	wftp-noop
	wftp-put
	wftp-pwd
	wftp-rename
	wftp-rmdir
	with-file-lock
	with-ftp
		a shell-scriptable ftp client



* a command for Linus

  I've added a new command in honor of Linus Torvalds who requested
  something close to its capability for Bitkeeper.  The command is:

	  % arch touched-files-prereqs REVISION
  
  That command looks at the patch set for REVISION and at all preceding
  patch sets in the same version (it searches your library rather than
  your repository for this purpose).  It reports the list of patches
  that touch overlapping sets of files and directories -- in other
  words, it tells you what patches can be applied independently of
  others.  The command has an option to exclude from consideration file
  names matching a certain pattern (e.g. "=README" or "ChangeLog").  
  It has an option to exclude from the output list patches which have
  already been applied to a given project tree.  It has an option to
  report the specific files which are overlapped.

  Linus actually asked for something slightly more precise: beyond
  looking at whether two patch sets modify the same files, he wants to
  know if they specifically modify overlapping sections of those files.  
  I haven't implemented that yet but it's certainly doable.
  (`touched-files-prereqs' is 297 lines long; delving into the patch
  details will likely more than double that.)

  (The real inspiration for this command is a combination of Linus'
   request for an equivalent feature with my need for the feature
   while thinking about geisler's branches.)

* star-merge wasn't handling the two-archive case properly

  When merging two branches in different archives, `star-merge'
  recursed indefinitely starting up FTP clients.  (Reported by
  Jonathan Geisler)


* the =INSTALL instructions now advise against parallel makes

  The `install' target for shell subcommands, at least, has 
  a (low priority) bug when used with parallel makes.
  (Reported by Andrew Morton.)


* various help messages fixed

  Some of the help messages had escape sequences that not all "printf"
  implementations like.  One had a shell quoting bug.  (Reported by
  Steve Murphy.)


* "printf: illegal number" from `commit' on some systems
  Fixed.
  (Reported by Ulrich Pfeifer, Diagnosted by Jonathan Geisler.)


* bugs in `mkpatch' pertaining to binary files have been fixed
  Fixed.
  (Reported by Bruce Stephens.)


* Additional build advice has been added to =README and pointed to in
  =INSTALL.

* The unnecessary `sb' program in `src/hackerlab' has been unplugged --
  it is no longer built or installed by default.  `sb' can act as a
  promiscuous inet server and is a name-clash on some systems.
  (Reported by Jan Harkes.)

* `readlink' (the program) is now called `read-link'.

  This fixes a naming conflict on some platforms. (Reported by Jan
  Harkes.)


* NOT DONE IN THIS RELEASE, BUT MUCH REQUESTED AND LIKELY TO HAPPEN

  The directory "{arch}" in project trees should, at least optionally,
  have a different name -- but this is slightly tricky to do in an 
  upward compatible way so I'm putting it off for now.





2002-02-06:

A number of portability and other minor bugs have been fixed.  

  
  * Fixed an alignment bug that effected SPARCstation LXes and
    probably other machines.  (Diagnosed by Jon Buller.)

  * Hopefully fixed a `broken pipe' error sometimes seen from `tag'.
    (reported by Jonathan Geisler -- please test this Jonathan)
  
  * configure didn't process "--prefix=" correctly
    (fix from Jonathan Geisler)
  
  * as-daemon could panic unreasonably if its cwd doesn't exist.
    Some uses of as-daemon in arch ran it from an about-to-be-deleted 
    directory. Both problems fixed.  (Diagnosis from Martin Waitz.)
  
  * The configure scripts didn't work with some versions of ksh.
    Apparently /bin/sh on some systems has a built in `test' which
    does not have `-e'.  `-e' use has been eliminated in those scripts,
    but I wonder if they need to allow the use of some shell other
    than "#!/bin/sh"?
  
  * The comparison to subversion FAQ has been fixed.
    Based on feedback from the Subversion team, and other sources,
    I've updated the comparison FAQs.
  
  * `valid-log-file' was hanging for some, nearly empty log files
    (Reported by Mikael Hillerstrom)
  
  * `finish-branch' complained that commit needed an `--unchanged-ok' 
     flag.  (Reported by Nicholas Dille, diagnosed by Jonathan
     Geisler).
  
  * Fixed problems in libhackerlab on big-endian machines.
    (Reported by Graham Hughes et al.)
  
  * In the build scripts, avoid invoking `ar -rc' for an empty list
    of object files (reported by Matthias Neeracher)
  
  * Fixed a bad interaction with GCC 3.1
    GCC 3.1 emits some non-file-names in the line directives
    of its `-E' output.  Those don't belong in `.d' (make 
    dependency) files.  (reported by Chris Marston)

  * Fixed a bug in `arch.sh.in' that made the "alternative name"
    instructions in "docs/examples/README.000.first-steps"
    not quite right. (reported by Lele Gaifax)




2002-02-05:

A number of portability and other minor bugs have been fixed.  

Instructions have been added for setting up your own branches of the
arch source code.  There are also some sample configuration files for
revision libraries and browsers (see "docs/examples").



2002-01-30:

arch' is now:

        * more secure
	* much faster
	* has a nice web interface
	* is more convenient
	* is more portable


Here's the scoop.  First the embarrassing item, then the really good
news.

* more secure

  There was a glaring hole in `with-ftp', now fixed.  Part of the
  reason for using FTP in `arch' was to avoid security headaches -- so
  there are very few places in the code where problems could possibly
  arise.  Nevertheless.  

  For the record, `with-ftp' now takes four steps to avoid exploits:

  	* with-ftp speaks to its clients only over file-protected,
	  unix domain sockets.

	* with-ftp passes a cookie to clients, via the environment,
	  which they must use to authenticate themselves to with-ftp.
	  The cookie is generated from the pid and several samples of
	  sec and usec timers (with intervening system calls), hashed
	  in various ways to produce a 32 byte sequence.

	* with-ftp clients now use only passive mode data transfer,
	  though active mode will be reenabled at a future date.

  In many places in the code, if `with-ftp' or any of its clients
  detect an error that suggests a security breach or attempted 
  security breach, they exit immediately after printing an all-caps
  error message.
  
  There is still a (less serious) security "issue" with arch -- not
  quite a bug.  That is that RFC959 (FTP) authentication is pretty
  weak and RF959 traffic unencrypted.  There are FTP extensions,
  supported by some servers, that add authentication and encryption.
  `with-ftp' should support those extensions -- a task which, for now,
  I'm leaving as an itch for someone to scratch.



* more portable

  In addition to FreeBSD, `arch' is now getting a lot of use on 
  GNU/Linux.  Thus, earlier adopters should have much greater success
  on GNU/Linux platforms than with release 1.0pre2.



* Very Fast Access to Past Revisions

  arch can now be configured to maintain a (space efficient[1])
  "forest of all revision trees" -- all the past revisions of selected
  versions made available as ordinary project trees.

  That means, for example, that you can browse past revisions using 
  your favorite file manager.  You can index all revisions using
  a tool like "agrep".  You can retrieve any file from a past
  revision with "cp".  You can compare past revisions using "diff" (or
  "arch mkpatch").

  When you have one of these revision libraries on hand locally, "get"
  (the checkout command) happens at roughly 1x or 2x the cost of a
  recursive directory copy (once to make the new project tree, then
  again (optionally) to create a cached copy in the project tree
  itself).

  You aren't limited to building revision libraries only for those
  archives that you administer.  You can build a local library
  for any remote archive, too.  

  If a remote archive includes a revision library, and makes it
  available via FTP, you can retrieve arbitrary files from any
  revision using ordinary tools such as `wget', `ftp', or `GNU Emacs
  Dired' -- each file will have a fairly natural URL.


* web interface

  There is now a fairly extensive web interface for browsing a 
  repository.  You can view the patch logs in several different
  ways, examine files from any past revision, view the merge history
  of any branch, or examine the revision frontier of any particular
  distribution. (See "http://www.regexps.com/repo-browser.html" for
  an example of this interface).


* triggers

  You can now trigger events, such as sending notification email, on
  the occasion of changes to a repository, such as committing new
  revisions or creating new categories.  I've started a mailing list
  for the commits I make to arch and the related
  packages. (arch-arch@regexps.com, mailing list archive available on
  regexps.com.)



* read only mirrors

  There's a new command for updating a read-only mirror of any
  repository ("arch push-mirror").  Combined with triggers, you
  can fully automate the process of keeping mirror sites up-to-date.
  (One consequence is that I'll now be able to update the regexps.com
  archive, which is a mirror of my local archive, more frequently.)



* bug fixes and portability fixes

  A number of minor bugs and typos have been fixed.  A number of
  portability problems have been resolved (though a Solaris port is
  still "in progress").

  `arch' continues to get lots of use on FreeBSD.

  Recently, it has been getting lots of use on an old version of
  Linux (I did have to update to a more recent version of `patch').

  Some portability problems reported on other platforms have been
  fixed, but I don't have access to those systems to test for myself.

  

* successful testers

  Judging by my log files and email, a number of users have succeeded
  in checking out `arch' from the repository at:

	ftp:regexps.com/pub/{archive-2002}

  and some have started creating their own archives as well.



* probable name change

  `arch' is almost certainly going to change names.  Unfortunately,
  some versions of GNU/Linux have a program, `/bin/arch' which is an
  alias for `uname -a'.  Sigh.



[1] On the Space Efficiency of Revision Libraries

  The storage costs of a new revision in a repository are quite low: 
  each (ordinary) revision is stored as a compressed tar file of just
  the patch set from the previous revision.   That format is ideal
  for long term storage, for mirroring, and for transactions that are
  efficient in space, time, and bandwidth usage -- but it is slow for
  some other tasks -- like checking out a revision separated from its
  baseline by many delta.  So that's where "revision libraries" come
  in.

  A revision library is a complete collection of full source trees for
  many sequential revisions.  These trees share common files in order
  to save space.  So, just how large is a revision library?  Is this
  really a practical approach?

  The storage costs of a new revision in a revision library (the
  "forest of all revision trees") is:

	D + S + Ch

  where

	D is the total size of the directory structure of the tree
	S is the total size of symlinks in the tree
	Ch is the total size of the files modified in the revision

  I did some calculations and figured out that, even though I've been
  coding many hours a day since I started on `arch', it would take me
  more than a year to fill up even 1G.  Measuring only the cost of the
  disk, that's only a few bucks per year.

  What if you are working on a very large project?  In that case, "D"
  will be larger, but "D" is small (relative to 1G) to begin with.
  "S" tends to be close to 0 in every project I've ever seen.  "Ch"
  will, in general, be _roughly_ the same for large projects as for
  small.  Why is that?  A formula for "Ch" is:

	Ch = N * F

  where
  
  	N is the average number of files that change per revision
	F is the average size of a file

  Those occasional revisions that make a global change, effecting
  every file, will be costly -- but most revisions will have a cost
  bound by the rate at which a programmer can make changes to just a
  few files at a time.  In other words, I'm assuming that, (1) except
  for global changes, N and F are pretty much the same for all
  projects, (2) global changes are infrequent.

  What if you wanted to deploy "arch" for a site with many, many
  repositories -- something like "SourceForge"?  In that case,
  optimizing the size of revision libraries *might* become important
  enough to take the next step, which is a portable file system
  implemented in user space that stores revisions in a library at a
  cost of:

	L + De

  where

	L is (roughly) the size of a recursive file listing, with
  	  tags, of the project tree

	De is (roughly) the size of the patch set for the revision

  That's certainly doable (with only a few thousand lines of new
  code).  For ordinary users, though, I doubt that optimization is
  needed.

  Finally, for occaisions where "D" is large, or the cost of building
  a link tree too large, on systems that support shadow directories,
  we might go that route, too.




2002-01-17:

This is a pre-release version of 1.0.  (For a precise release id, see
the top-level file "=RELEASE-ID".)  The purpose of this release is to
distribute bug-fixes.

All compiler warnings generated by `-Wall' have been fixed.  This
includes some problems that (seem to have) caused failures on 64-bit
systems.

Several users have reported success building and testing on GNU/Linux
while others have encountered problems.  The reported GNU/Linux
related problems have been fixed.

A successful build and `make test' has been reported for Debian GNU/Linux
(unstable).

Problems have been reported on Solaris.  Some of these have been
fixed, others are pending.

A number of small typos and minor bugs have been fixed.

Several GNU/Linux users have complained about a naming conflict with
the program /bin/arch, which is an alias for `uname'.  `arch' may have
to change names.

There are now some mailing lists for arch:

	arch-announce@regexps.com
	arch-users@regexps.com
	bug-arch@regexps.com

Add "-request" to subscribe.

Port reports and patches related to porting are especially welcome
contributions.

People interested in working on arch should be able to check out
sources from regexps.com and create their own archives -- let me know
if you have troubles with that.  If you do create branches in your own
archive, be sure to use `archive-cache-revision' to speed up access to
those branches.

The manual recently underwent a fast and furious edit to work around a
possible trademark infringement (I have my doubts that the
infringement was actual, but stepping around the problem this way was
the simplest solution at this juncture).  In general, because it was
written quickly, the manual is not entirely consistent with its use of
terminology.  Input from proof-readers would be another welcome
contribution.



2002-01-16:

This is a pre-release version of 1.0.  (For a precise release id, 
see the top-level file "=RELEASE-ID".)

This is the first public release of `arch'.

At this time, much of `arch' is known to work well on the author's
FreeBSD system.  It is known to compile and pass the most rudimentary
of tests on an old GNU/Linux box.

Port reports and patches related to porting are welcome contributions.


# tag: Tom Lord Wed Jan  9 08:32:14 2002 (arch/=NEWS)
#
