Forwarded: Thu, 19 Feb 1998 02:06:53 -0500
Forwarded: "mills@udel.edu "
Replied: Thu, 19 Feb 1998 02:05:18 -0500
Replied: "Bruce Bartram 303-497-6217 <bwb@etl.noaa.gov> "
Received: from mail.eecis.udel.edu by whimsy.udel.edu id aa25684;
          17 Feb 1998 10:16 EST
Received: from mickey by netsrv (SMI-8.6/SMI-SVR4)
	id IAA09303; Tue, 17 Feb 1998 08:16:17 -0700
Received: by mickey (SMI-8.6/SMI-SVR4)
	id IAA13843; Tue, 17 Feb 1998 08:16:26 -0700
Date: Tue, 17 Feb 1998 08:16:26 -0700
From: Bruce Bartram 303-497-6217 <bwb@etl.noaa.gov>
Message-Id: <199802171516.IAA13843@mickey>
To: stenn@whimsy.udel.edu
Subject: proposed patch to xntp3-5.9x

Howdy,

I earlier sent this to Dave Mills, and I've attached his reply.
I notice that the propsed patch didn't appear in xntp3-5.92, so
I'm sending it in to you for consideration in case it fell
into the crack.  The issue has come up again in a discussion
on the side of the newsgroup related to the "local ATOM" thread.

The patch is UNTESTED.  I haven't tried to simulate the conditions
that cause this error.  I've only heard of a couple of troubles
that might be related in the newsgroup.  Stephen L Moshier
<moshier@mediaone.net> sent me a demonstration of this strange
behavior in a test setup with local prefer and outside sources
at the same stratum.  He didn't send me output showing the patch
made things better.

The patch is very small and I believe the effect is very limited.

The problem:

In normal operation, the xntpd control loop drives the host's
clock to reduce system variable "sys_clock_offset" towards zero
but making adjtime() (or kernel PLL) calls.

When the ntp.conf has "server 127.127.1.0 prefer" and this is the
sync peer, the xntpd is locked out of making host time changes,
because it is presumed that some external protocol is controlling
the host's clock, so global variable "sys_clock_offset" can't be
driven towards zero.

In most cases with local prefer, I think sys_clock_offset should be
identically 0, but it can become non-zero if the host clock is
jerked at interrupt time (hypothecial, but a case like this was
the original report that lead me to consider this trouble) or if
another peer is the sync peer, an offset is driving the loop filter,
and this peer then looses the selection back to the local refclock.

If either of these happen, the sys_clock_offset can become non-zero
and it won't ever change.

The daemon continues to run at an offset to the local host clock.

The solution:

Since clearing a system variable takes very little time, my proposed
patch simply whacks sys_clock_offset to zero every second in the
ntp_loopfilter.c section that notices that the sync peer is local prefer.
This clear could occur only when the local prefer becomes sync peer,
but that wouldn't fix the trouble jerking external disiplines.  Only
regular clearing will fix the jerky case.

Things the patch may break:

I think that "fudge 127.127.1.0 time1 <offset>" might be able to
create an intentional sys_clock_offset.  This patch would zap it back.
A compile option could skip the clear in this special case.

ntp.4:

ntp.4 doesn't seem to have this variable at all.  I don't think this
trouble can happen there.  This also supports making the patch mainstream.

Bruce Bartram     bbartram@etl.noaa.gov    just another chimehead


----- possible patch to ntp_loopfilter.c 3-5.91   UNTESTED -----
   I think it also applies to 3-5.9x, or more

*** xntpd/ntp_loopfilter.c	Tue Nov 11 15:59:45 1997
--- xntpd/ntp_loopfilter.c.proposed	Tue Nov 11 16:04:49 1997
***************
*** 87,92 ****
--- 87,93 ----
  int fdpps = -1;			/* pps file descriptor */
  int pps_enable;			/* pps disabled by default */
  char cutout;			/* override for max capture range */
+ extern	l_fp sys_clock_offset;	/* correction for current system time */
  
  /*
   * Imported from the ntp_proto module
***************
*** 586,591 ****
--- 587,599 ----
  	if (sys_peer) {
  		if (sys_peer->refclktype == REFCLK_LOCALCLOCK &&
  		    sys_peer->flags & FLAG_PREFER)
+                        /* I think that sys_clock_offset might be jammed
+                         * to exactly zero now.  It might have had a
+                         * small residual before things switched to the
+                         * local refclock prefer at lower stratum, or a
+                         * glitch might have happened during interrupts
+                         * when the external control jumped the time */
+ 	                L_CLR(&sys_clock_offset);
  			return;
   	}
  	L_CLR(&offset);

----- email exchange with Dave Mills -----

To: mills@udel.edu, stenn@whimsy.udel.edu
Subject: NTP 3-5.91 patch proposed

Howdy,

I've been trying to help Wernke zur Borg <wzb@anitesystems.de>
with a newbie's set of questions (at least I think he is new to
NTP) on how to set up a "local refclock prefer" configuration with
a custom driver that reads some hardware timecode box.  I don't
know what version of xntp he is using on his Solaris 2.5 system.

He has sent me an email that shows:
   sce250{stc}: ntptrace
   localhost: stratum 4, offset 31.882483, synch distance 0.01021
   127.127.1.0:    *Timeout*
while ntpq shows the local refclock sitting at offset 0.
   remote      refid    st t when poll reach   delay   offset    disp
=====================================================================
*LOCAL(0)   LOCAL(0)     3 l   10   64  377     0.00    0.000   10.01

I can't think of any reason this could be possible without
crazy sys_clock_offset, possibly due to his alternate protocol
jumping the system time by about that amount.

I think that the patch below should be considered to absolutely
force the offset to zero when the local refclock prefer is the
sync peer.  The patch is two lines grep'ped from ntp_unixclock.c
dealing with sys_clock_offset and a bunch of comments as to why
the offset might need to be cleared.  The source was xntp3-5.91.

I took a quick look at ntp-4.0.70a source, and I don't even find
where the offset is added to the time in get_systime.c, so I feel
very lost.  I'm not much of a programmer and it takes me a while
to absorb enough to find my way.  Sorry I'm not smart enough to
see my way to helping with the new version.

If you think the patch might be the correct way to handle this
trouble, please email me and I'll forward the patch to him.  I'm
reluctant to send a patch without your advice.

Thanks for all the great efforts and wonderful code !

Bruce Bartram    bbartram@etl.noaa.gov     just another chimehead

----- possible patch to ntp_loopfilter.c 3-5.91   UNTESTED -----

[SNIP duplicate copy of patch]

----- Dave Mills' response, and followups

Date:     Tue, 11 Nov 1997 20:27:13 EST
From: Dave Mills <mills@huey.udel.edu>
To: Bruce Bartram 303-497-6217 <bwb@etl.noaa.gov>
cc: mills@udel.edu, stenn@whimsy.udel.edu
Subject:  Re:  NTP 3-5.91 patch proposed

Bruce,

You did some good thinking there; however, NTP v3 is not on hold while
we shake the bugs from NTP v4. The new version has been cleansed of
years of accumulated dust and dirt and ntp_unixclock.c has gone to
the dumpster. The sys_clock_offset thing was a bad idea from the
beginning (not mine, fortunately) and has also gone the dump.

Dave

To: mills@huey.udel.edu
Subject: RE: NTP 3-5.91 patch proposal

Howdy,

Thanks for the quick reply.  I've sent the proposed patch to Wernke
and hope it helps him.

[SNIP quoted section]

I've have an alternate idea on how to use system_clock_offset.  It might
be a useful "trick" if an ntp daemon was going to run without privledge
on a host with a smooth, but undisiplined clock, such as a firewall router.
Instead of doing the adjtime()s, they would be simulated and things would
work so long as the host's time was smooth.

Such a trick could also be useful for testing a daemon on a host.

In a normalling operating daemon, I think all the filtering details
should be simple and I suspect sys_clock_offset can get combined into
the filtering details, ignored or dropped.

Bruce Bartram    bbartram@etl.noaa.gov    just another chimehead

Date:     Wed, 12 Nov 1997 22:22:46 EST
From: Dave Mills <mills@huey.udel.edu>
To: Bruce Bartram 303-497-6217 <bwb@etl.noaa.gov>
cc: mills@huey.udel.edu
Subject:  Re:  NTP 3-5.91 patch proposal

Bruce,

Your suggestion occured to a few folks awhile back and was the motivation
for the variable in the first place. Howeveer, it resulted in a rather
large cruft of dangerous code which I eventually threw out.

Dave

