================
APEX Boot Loader
     NOTES
================

TODO:

  o RedBoot and APEX

    In debugging an update to APEX, I found that the startup copy
    commands would tend to fail.  The copy would have some garbage
    in the middle of the destination: 0x632b0, 0x63f00, 0x638c0 when
    copying the kernel to 0x8000.

    Changing the kernel command line to *not* use variables for the
    copy source and destination appeared to fix the problem, but that
    wasn't a very satisfying solution.  Looking at those memory
    location from RedBoot revealed that the contents of memory at thos
    locations appeared to come from the contents that were there when
    RedBoot was running.  It was easy to detect the error by
    performing a checksum.  Oddly, a second copy of the data was
    always successful.  I suspected the cache.

    Revising to cache flush code to more closely match that which the
    kernel implemented didn't seem to fix the problem.  What didn't
    seem to fix the problem was disconnecting the Ethernet cable from
    the Slug while booting.  Thus, I suspect that the problem is
    caused by the NPE still thinking it can write to memory.

    Really, this is probably never going to be a problem since RedBoot
    ought to normally disable the MMU and caches before executing the
    second stage boot loader (or kernel) and I suspect that this
    oddity won't occur under normal use.

  o Support for non-ARM

    o The biggest hurdle to supporting non-ARM architectures will be
      the implicit dependence on stack-free procedure calls.
    o Some platforms, such as PowerPC, may have SRAM that can be used
      for a stack until SDRAM is initialized.
    o It may be feasible to rework the initialize_platform code to run
      inline instead of using calls.  Really, there isn't much that
      must be done from the standpoint of execution.
    o x86 is the real bear, but I don't think we really care about
      that.

  o Enviroinment/alias variations.  The default state is to use the
    simply named version of each variable, aliases before environment
    variables.  If 'variation' is defined among the aliases, it is
    appended to the variable name and that one is checked first.  If
    that one is found, it is used.  Otherwise, the default is used.

    For example, when variation is "-1", "boot-1" would be used to
    boot the kernel.

    This new feature requires support in two ways.  First, we need
    another standard environment variable to hold the boot preparation
    commands, "setup".  So, "startup" is called first.  It may set
    "variation".  Then APEX calls "setup".  Setting the variation will
    can change *all* variable references including driver regions.
    So, for example, the variation can be used to set the underlying
    partition for booting simply by changing the base device.  This
    can also be used to change some other environment variable used as
    a reference elsewhere.

    Now, we need a command, platform specific, that determine the
    variation and sets the appropriate alias.  It would be put into
    the startup command: "check-variation", "query-variation",
    "select", "survey".  On nslu2, for example, it would check whether
    or not the reset button is pressed.  If so, it would set variation
    to '-alt'.  Perhaps the suffix is part of the command.  At this
    point, the boot would proceeed normally.

  o Very slow environment region means that an empty environment makes
    the boot-time very slow.  So, we should check it once and if it is
    empty we can assume it is still empty when the first word is
    0xffffffff.

  o Add option to change the write throughput spinner for the NOR
    driver.  The lh7a404 is much faster that the lh7952x's and the
    spinner should reflect this.

  o Alignment trap handler would be a good thing.
    o We could catch errors instead of failing completely.
    o Not worth too much code.

  o descriptor_d.width
    o annotates a descriptor, telling the driver what width the caller
      would like used.  
    o Force 1 byte width for CF interfaces.
    o Force 2 byte width for 16 bit only access
    o Force 4 byte width for unforgiving CPU registers, e.g. IXP
    o This needs to propagate to dump and fill so that we can control
      the access to some regions.
    o We could annotate memory, for a given target, so that the memory
      driver does the right thing.

  o Need to fix the envmagic script because it it a significant
    portion of the time it takes to build on the nslu2. 

  o Driver enhacements

    o Query infrastructure.  There presently isn't a way to ask a
      driver anything.  A query infrastructure could help, but it must
      be very light weight.
    o Query for the base address.  This would be used to convert FIS
      regions into nor: regions.  At the moment, FIS regions are
      mapped to mem: regions which means two things.  First, it is
      possible that the flash is not in READARRAY state.   This turns
      out not to be true, but it *could* be.  Second, we cannot write
      or erase the FIS partition which would be a convenience.  Having
      a way to query the start of flash would mean that we could
      return a valid nor: region.
    o Query for block size.  This would allow us to detect erases
      operations that don't request erasure of the whole block.  This
      would allow a safe environment erase function, or perhaps a flag
      to the erase function to handle this case.

  o Must implement safe flash region erase.  Must must must. (see above)

  o We've implemented alias/environment variables for the block
    drivers for the filesystem and partition drivers.  This may mean
    that we need to move these drivers such that the init's follow the
    environment initialization.  At the moment, this isn't the case
    since none of them do anything at init time. 

  o Implement uniform driver basis in environment.  There is no reason
    not to use an environment variable for the default driver for each
    of the higher level drivers.  This means we can, for now,
    circumvent the need for layering drivers with regions.  Also, this
    will make it more obvious to the user how to change the underlying
    driver for each of those systems.  *And* we can then create a DOS
    partition driver....maybe.
    o Environment/alias part is done.

  o Document that the initrd= command line option can be used to setup
    the ramdisk location.  Unfortunately, the parser may not handle
    hexadecimal.  Need to check on that.  It's 'initrd=start,size'.

  o SD/MMC driver doesn't have timer_fetch() function because it
    hasn't been initialized and the functions aren't in .bootstrap.
    It might be worthwhile making this work.
  o Add a 'show' function so that we can 'show drivers' and 'show
    services' and maybe other things.

  o Support for arrow keys
    o I believe that the only hitch is that we need to cope with the
      multi-byte-ness of the codes.
    o Perhaps a simple state machine will do. 
      ESC [ A
    o The [ will be suppressed as will the next character.

  o If the loader is small, the relocation code performs adequately.
    I haven't measured the performance, but it seem to be reasonably
    quick to start.  

    However, if the loader were substantially larger because, say, it
    included a splash image, we would probably want to do something
    about the relocation routine.

    The easiest thing to do is to allow for the instruction cache to
    be enabled early, before there are MMU tables.  This is OK for the
    ARM7/ARM9 AFAICT because it assumes linear memory mapping *and*
    the relocate inner loop is a simple four (or so) instruction
    fragment.

    There is more, though.  It may be desirable to enable the
    instruction cache while uncompressing the kernel.  I suspect that
    the gzip decompression would be noticeably faster with the
    instruction cache.  However, we don't know if the kernel will
    tolerate the presence of the I-cache once it boots.  

    o There is now code to do so, but it doesn't appear to make a
      different on the ARM9.

  o Copying from TFTP doesn't reliably finish.  The transfer tends to
    halt, either after no blocks, or one block.  We need to diagnose
    the nature of the failure and either implement a retry, or come up
    with a method to restart the operation.  For example, it could be
    that we restart transparently so that we can see to the place
    where we are attempting to read.

  o The envmagic is weak.  If a variable isn't defined for a given
    configuration, but it exists elsewhere in the code, the magic
    number will be the same when thay variable is added to the
    configuration.  So, either we insert the string into the
    environment, or ... I don't have another plan except to ignore the
    problem.

  o arp_resolve returns a string.  It should be able to return an
    error condition.  This means that a ^C will terminate the ARP, but
    it won't tell the user that that is the reason.

  o Command history.

    o Should be easy to implement
    o Ring buffer of strings.
    o Say, 4k
    o Each line is entered as a null terminated line of text followed
      by a word aligned length.
    o There is a pointer to the next available position in the buffer.
    o Previous command is found by back tracing until the length of
      the previous command is zero or the length bypasses the pointer
      to the next available space in the buffer.
    + Implemented

  o There is an error in the dump code.  It doesn't dump the CLCD
    registers correctly.  This is not excusable. 

  o Improving network responsiveness

    o Since there is a limited number of receive buffers, we may need
      to change the structure of the network protocol drivers
    o Need a flush function to flush receive buffers.
    o RARP is a good example.  The response from the server is nearly
      immediate.  If we don't clear the buffer, it may take a timeout
      and retransmission to see the reply
    o Still, need to inspect the network traffic to make sure this is
      the reason for the delay on rarp.

    o May need to wait for the link to come up before sending the
      network configuration packets.  Don't know, but we've added the
      flush before sending the rarp packets and it still takes a while
      for rarp to respond.

  o Interrupt key

    o There are places, especially in the network code, where console
      activity will cancel the operation.  Trouble is that we don't
      want it to be overly sensitive.  It would be good to have some
      defined keystrokes for for break and cancel.  ESC isn't a good
      choice because we're like to be able to use arrows at some
      point.  ^C is better. 
      + Implemented in the console code.
    o We can probably get away with a Kconfig option and no UI to
      change it.

  o emac-lh79524
    o Looks like there is a problem transmitting when packets are
      received.
    o Need to get a better picture of the network traffic.  ethereal
      shows that tftp is fine in a full-duplex switch situation. 
    o In half-duplex as I have, there are lots of problems 
      transmitting when there is traffic on the wire.  Need to
      instrument.
    o BTW, the SMC91x has no problem with this.  In fact, as far as I
      can tell it always handles retransmits flawlessly.
    o Actually, the problem appears to be that the emac is easily
      overwhelmed by packets when in promiscuous mode.  By eliminating
      the CPYFRM bit, it seems to be stable with tftp receives.

  o Network traffic, continued
    o I've created an ethernet_service function with a user supplied
      termination function. 
    o The user can let the receives continue until an arbtrary set of
      conditions are met.
    o The only pieces missing is a way to hook the receive so that the
      user's function can gain access to inbound packets.
    o Simplicity.  Needs to work for internal features as well as
      external ones.  For example, rarp packats should not be in the
      core since the user may never care about them.
    o We can simplify by eliminating the interface, for now.
    o register_ethernet_frame_receiver(receiver, priority) Either a
      static array (which I like) of limited length, or a linked list.
      Latter requires some sort of memory management, ew.
    o int frame_receiver (frame), returns zero if the packet should
      continue to be checked, non-zero if the packet has been
      received and there is no reason to continue processing.
    o There can be receivers for icmp, rarp, and arp built in.  We can
      hook dhcp.
    o The receivers must agree to be efficient.
    o We may want to engineer the drivers to be able to cache a couple
      of packets.
    o Here's what we need to do.  Ethernet receive should do some
      cursory checks, verify that the MAC address is interesting and
      the IP address is relevent when there is an IP header.  After
      that, the receivers can be used to handle the rest of the
      verification.
    o We need a port allocator with a somewhat random initial value.
      If that isn't possible, we should at least be able to count up.

  o Note that none of this really helps with multiple consoles.  We
    still need to eliminate statics and there must be a want to
    efficiently execute more than one thread.  Remember that the
    SSEM did this, but it has select.  Perhaps we should do the
    same.  Hmm.

  o Network traffic state machine
    o We've been able to keep this application simple since there has
      been only one thread of execution and one task
    o Network changes all of that
    o telnet sessions will also make it necessary to be more flexible
    o Need to have threads of execution
    o Network packets can be queued to a thread
    o I think we can have a function that handles network traffic in a
      polled fashion, dispatching ICMP and ARP, if we choose, and has
      a couple of terminating conditions: receipt of a relevent packet
      or a timeout.
    o Trouble is that we may receive a packet and be busy elsewhere.
    o We can service the network connection in some fashion while we
      do other work, but we may need to queue packets for processing
      since the network interface has limited capability for buffering.
    o Perhaps this packet queue can be limited to the network, for
      now.
    o It would be good to think ahead here.  Can we add logic to the
      network layer so that we can queue packets intelligently and
      process them when the time comes.  For instance, we register
      discriminator functions for each task that can deal with network
      traffic.  Discriminators perform the queuing themselves.  There
      comes a time, later, when the discriminator can do what it likes
      with the packets.  It may even be the case that we do the
      network logic immediately.  For example, the tftp driver
      registers a discriminator that can receive data packets, but it
      will automatically stop when its queue is filled.  The core
      routine that is using the data will call the service function
      when it can.  If the tftp_read call is entered and there is no
      data available, it will block until there is some.  Essentially,
      it will enter a special form of the service loop that terminates
      on a timer instead of returning immediately when there is
      nothing to do.
    o *** How can this be generalized to handle multiple console
      streams?  Can it be?  Can we do a round-robin for console
      streams such that the separate threads of execution each get a
      chance to handle command, process input, and so on?  This seems
      like it may be overkill for a simple monitor, but we will need
      something if we start to be able to handle telnet sessions as
      well as a USB console.

  o tftp write requires that we know if a descriptor is for reading or
    for writing.  There is nothing in the interfaces to support this at
    the moment.  Either we cache the file-open data, or we let it all
    be lazy and perform the open when the first read or write call is
    made.  
    + Reading works fine at the moment.

  o descriptors really need to allow drivers to define a
    context.  tftp is probably the most compelling case for
    this...especially once we've opened the connection and we need to
    keep track of the open file.  Alternatively, we can continue to do
    as we've pleased which is to use a driver global.  Ho hum.

  o The filesystem drivers don't say enough about where they get their
    source devices.  Perhaps it would be best to use aliases so that
    we can control it at runtime.

  o The adc test driver needs to be able to check for touch events.  

  o Do we want to be able to show all variables, env and aliases in
    the same command?

  o ipconf command
    o ipconf rarp
    o ipconf dhcp
    o ipconf boot
    o ipconf 192.168.8.203/24
    o ipconf clear

    This will make it clearer to users how to clobber the
    configuration.  Also, we may be able to enforce reconfiguration
    this way.

  * There are two copies of the adc driver, one in the 7952x directory
    and another in the 7a40x directory.  I'd like to reduce the
    redundancy, but that would require moving the driver to be outside
    of mach-*/.  Either make a mach-lh/ or put them in drivers/.  The
    trouble is that I need a way to make sure that...Perhaps I should
    have a drivers-lh/ for shared drivers specific to Sharp products.
    That way, I can make it clear.
    * Done and it's nice.

  o We've added RARP which can configure the IP address.  There is a
    problem, tho.  The user can get the IP address from the alias
    list.  If the user clobbers it, the host_ip_address isn't
    clobbered.  The user might think that erasing the alias removes
    the IP address setting.  We need to either detect the loss of the
    alias, or we need to no store the IP address in an array. 

  o Need a default command line setting in the config file.  It should
    completely override the platform default, when present.

  o Pipes should work for streaming between drivers.  [g2] wants to be
    able to specify a list of drivers/regions to try.  This may mean
    I'll have to combine the parse and the open methods so that the
    open can continue until it succeeds.  
      E.g. nor:128k+1m|(ext2,vfat):/boot/zImage
    Trouble ensues when the options are more complex, NxM.
  * Aliases can be simple text substitutions.
     alias flpart nor:128k+1m
     alias kernel /boot/zImage
     flpart|(ext2,vfat):kernel
     * Done.  Substitution is with $.
  o The splashscreen should be able to be read from a filesystem
     copy ext2:/boot/splash fb:
    This will also require that we be able to store an image in the
    loader and copy it out.  Aliases make this possible
      alias splashimage ...
    + We have a splash command that reads from a region but doesn't
      write to one.  In other words, we *could* do as suggested, but
      there isn't much reason for it from the POV of simplicity.
  o Aliases mean we can have references to 
      alias env nor:128k+32k
      alias apex mem:...
    aliases must match exactly.  Always.  I'd like to make the
    environment alias be environment, but partial alias matches might be
    too much of a strain.  Perhaps near matches are OK if there are
    no other near matches.
  o It looks like there is a conflict with using the name ./config for
    the configuration file and all of the config targets.  I believe
    this to be due to hidden dependencies within make.  Bad make.  So,
    we're going to go back to the .config files for now.  I'd use
    conf, but that is the name of the configuration program.  It may
    take some time to sort this out and make it neat.

  o There is a need to be able to control the access width in the
    memory driver.  Sometimes, for the purpose of debugging, it is
    helpful to be able to read different widths.  The memcpy command
    is probably too coarse in some cases as is evidenced by the fact
    that dumping from the CF flash region on the 79520 does the wrong
    thing.
    o I believe the problem really has to do with the chip select
      issue.  Without reads from system memory, the chip select
      doesn't toggle and the CF fails to return a different memory
      location.  Thus the real problem is that the memcpy performs too
      many transfers at one time.    
    o Finally, I don't know what the right solution to the problem
      should be.
    o Dump could be made to do the right thing with a switch to read
      only one byte (or word) at a time.  This is probably desirable
      anyway since we could use a mode to display words or
      half-words instead of just bytes.

  o It might be helpful to make the envmagic script more repeatable.
    Need to look into this.  Seems like it changes when there are no
    changes to the environment variables.

  o An erase confirmation might be helpful, but it requires that the
    driver participate.  It would be helpful to tell the user exactly
    what will be erased.  Note that BDI doesn't, and I don't think
    that other bootloaders do.  Still, it would be helpful to let the
    user know what the erase is going to do.

  o Separate the string functions so that the linker can include only
    those that are needed.
    + Done.

  o I think that the partition logic needs to move to the CF driver
    (and other block-level drivers) so that the filesystem code can be
    cleaner.  It allows us to read from unpartitioned devices as well
    as any other random-access, byte stream.
    o This change will require that the spec strings are filled out to
      handle chaining.  We would do fatfs->dospart->cf.  Might seem
      heavy-weight, but it means reasonable flexibility.  It ought not
      be much code and the calls can be handled with simple
      pass-through in the case of read().

  o Looks like the ADC test driver may interfere with the kernel
    booting properly.  First I've seen of it.  It's OK, though, since
    we don't need it.

  o Might be good to have a paranoid setting that lets us require
    confirmation before clobbering flash or eeproms or whatever.  This
    could be a list of important regions.  The region list could be
    bothersome since there is aliasing unless it is specific to the
    driver that can clobber it.  Then again, not.  Since writing has
    to go through the driver that understands the true addressing.  It
    means that we have to protect nor:#+# for flash and cannot depend
    on mem:#+# to be recognized as being the same.

  o Buried region strings must be revealed.  The emac driver has one
    for the MAC address EEPROM.  The fatfs driver has one for the
    underlying block device.  The latter may be best revealed with an
    environment variable...or not.  Don't know about the emac one.
    I intend thay the user specify the fatfs support driver via the
    specification string.  fatfs:cf: or something.

  o Need to rectify the vpen_enable code in the flash drivers.  It
    ought to be something defined in the hardware file s.t. it is not
    compiled unless needed.  IIRC, it was smaller to put the vpen code
    into functions than to put it inline.  However, it is easier to
    code if it can be inlined.  Hmm.

  o Would be nice if ^C could break into long-running functions.
    Could be dangerous, though, e.g. writing to flash.  More thought
    needed.
    + Supported and implemented in several places.

  o I am suspicious that the nor-cfi erase code isn't quite right.
    When asked to erase nor:0+120k, I think it erased the block at
    nor:120k.

  o Checksum function doesn't appear to work over FAT read files.  It
    needs to clever about summing in the case that the descriptor
    itself knows the length.
    + I believe this is fixed.

  o Wonder if I need to do a shutdown before performing reset s.t. I
    can put the NOR flash in the right state.  This might be why reset
    sometimes fails.

  o Add a -v option to copy to make sure the copy is successful.

  o There ought to be a target to release that increments the version
    number, makes an SVN copy, and pushed the code to the ftp server.
    o Part of this is done.

  o I'd like to mode the CROSS_COMPILER assignment to the
    configuration file *and* allow the user to override it.  The ?=
    operator isn't working for some reason.  I'll leave it for now.

  o Again, need to add an info function.  It can be used to get info
    from a driver or to look at a directory listing from FAT.  This
    idea isn't quite complete.  It would be good to combine the driver
    info stuff with this function in a way that the command
    'describes' a descriptor.  Thus, a descriptor for an exact file
    shows the file info.  A descriptor for a directory lists the
    directory.  A descriptor for a driver describes that driver.
    + Implemented the ability to read directory contents...except for
      fatfs.

  o The default length for descriptors needs to be thought out.  If we
    set the default length to the greatest length available, then we
    run into trouble with dump.  We could communicate something about
    the use to the open call such that the driver knows if it needs to
    set a default length.  Or we could let the user do something
    interesting such as giving a @+. to indicate that the maximum
    length should be used.  The problem is one of use.  In files, the
    default length is usually going to be the whole file.  In memory
    regions, the default length is OK where it is.  I think it is best
    to add a more mode to dump and let lengths be maximal on files.
    Also, we might need a way to cancel some commands that run awry
    because of the (possibly) excessive elength.

  o It would be good to add a checksum function that can be compared
    to a command line tool.  MD5 or SHA1.  We want something that can
    be small, though.  Grrr.  We could also write a utility that can
    do the same checksum on the UNIX command line.
    o Yipee!  The cksfv program does the right thing.
    o Add to the documentation that the cksfv program can be used to
      compare checksums.

  o Need a function like ls or info that can list the contents of a
    directory or give data on a device.  Perhaps we can overload the
    info function to allow a complete descriptor instead of just a
    driver.  Not a bad idea.  This, then because a function of a
    driver to allow a describe, info, catalog, or whatever on a
    descriptor.
    + Implemented as info

  o It would be nice if the fat/cf interface could be smart about
    detecting device changes.  Trouble is, CIS info doesn't appear to
    be enough.  It may be possible to use the hardware to detect a
    device change and propagate that upward.
    o This would probably require an extra hook in the driver.
    o And, it isn't necessary since we read all of the device
      structures every time we perform IO.
    o Although an interrupt timer could make this possible.

  o APEX prints an error when the startup command doesn't have a trailing ';'

  o We can write an erase function that can cope with erasing a
    portion of a flash block.  Actually, not really too hard.
    o This could be used to erase the environment when it follows the
      loader in the same flash block.
    o This might be dangerous and, therefore, require some
      precautionary words for the user.

  o Dump command needs a MORE mode.  Or, we need to limit the extent
    of the dump. 

  o Let's add a driver alias function.
    o once we get into the thick of driver-land, we may have some
      cumbersome driver specs
      o ls fat.mmc.spi:/
      o copy tftp.emac-1:vmlinuz mem:0x2000
    o The cascade of drivers will be necessary for the sake of
      interoperation, but it might make things difficult for the user.
    o So, alias fat.mmc.spi mc
    o Then  ls mc:/  will do the right thing.
    o Cake   
    o Also, we may need to pass parameters to a driver.
    o ls fat(1).mmc.spi:/ might be a reference to the first partition
    o ls mc(1):/ would be equivalent
    o I can imagine that some aliases might be generated at runtime as
      defaults.  I don't have specifics right now.
    o This also suggests that region spec's will have a qualifier
      about the compatible formats.  Filespecs and memory region specs
      may not be compatible.  Or not. :-)
    o This aliasing could be used to refer to regions.  
      o alias apex: is the region of memory where apex is executing
      o alias env: is the region where the environment is stored.

  o The MASK_AND_SET macro I'm using used to be commented out.  Well,
    it was similar.  I changed it a little and put it back because the
    two step version make the ADC_PC register setup break.  Don't know
    why.  Need to make sure that all of the other code is still
    working.

  o driver info functions, especially memory
    o I think this should be done with a service hook instead of a
      driver hook so that the environment can report describe it's
      descriptor.
    o started.
    o hook removed, switched to a service entry point 
    o Completed, except that it displays through the version command.
  o look for more unused symbols
  o Doc on XIP kernel?  We *can* linux run completely from flash
  o command history

  o lh7a40x
    o buffered NOR write...oh boy is this needed
    o Should be little else since there is only one memory device
    o Not much incentive to be small with 80k of SRAM
  o CONFIG_ENV_SAFE_ERASE to allow user to erase whole
    environment...there is already a configuration option for this. 

  o Exception vectors at the start of the loader probably don't make
    sense.  
    o If we every enable interrupts, we won't be running this loader
      from the start of addressable memory.  I suppose we could, but
      that hasn't been the way.  The kernel needs to load at
      0xc0008000 or 32K from the start, and the parameters load at
      0xc0000100 which is even worse.  Just not a good plan with
      linux.
    o Instead, we'll map some kind of RAM at 0, tell the CPU to map
      exceptions to zero, and well write vectors there.
    o This way, we can have interrupts in the loader without the MMU.
  o Simple interrupts
    o 32 entry function table
    o irq disabled if the function is empty.
    o no priorities
    o partially implemented, but non-functional 

  o Need to regularize the sdram and nor flash config parameters.  The
    arch determines some of these parameters.  Config should only
    enable.  For example, CONFIG_SDRAM_BANK0=y and the rest comes from
    the platform.  Perhaps the config should specify info on the SDRAM
    chip(s).
    o The SDRAM part has been done.  
    o Problem with flash is that we probably needto continue letting
      the config specify the base address.  The arch does know where
      flash can be, and there is one bank that is likely to always be
      there if any are there (boot flash), the flash organization may
      still require that we know the base addresses.  Lengths really
      aren't needed for CFI compliant banks.

Goals

  o Simple structure
  o Memory image may be directly written to flash
    o This may be challenged by structures that have defaults and are
      later modified.  If we can avoid doing this, then all will be
      well.
  o Use of tables of pointers to handle features, commands, drivers
  o Driver self-discovery when relevent
  o Small image, less than 16K for basic loader
    o This will be hard when using the kernel's printf
    o CONFIG_SMALL helps 
  o Reasonably configurable at ./configure time
  o Good command line support
  o Code sharing among targets is *not* of the highest priority
  o No assembler files
    o There is some hand assembly in the start-up, but that's probably
      unavoidable.
    o Still, no .s files 
  o No external dependencies aside from a tool chain
  o Staged execution (see below)
  o Support for over-the-wire and NAND booting--teeny tiny bootstrap
    o relocate function
    o Presently, there is only one relocate supported.  We may want to
      let it be flexible so that the loader can be common for several
      configurations.
  o Multiple targets
  o May execute from FLASH, XIP
  o Stack storage will be required
  o Automatically relocating
    o This is done in the relocate_apex() routine.  It's position
      independent up to the point that this function executes.
      Afterward, absolute symbols may be used.  Note that weak
      symbols, at the moment, cannot be used until after relocation.

  o IO descriptor (region): "device:path"
    o "nor:@0x44000000#0x20000"
    o "nand:@0#0x20000"
    o "serial:xmodem", "serial:binary"
    o "eeprom:"
    o allows drivers to handle details of transfers
      o copy "nor:@0x44000000#0x20000" to "mem:@0x20000000"
      o copy "tftp://192.168.8.1/zImage" to "mem:@0x20000000"
      o copy "http://192.168.8.1/zImage" to "nor:@0"
      o erase "nor:@0#0x200000"
    o Should support for suffixes, m,k
    o Relaxed the @ requirement.  memory driver is the default which
      means that 1m#1k specifies 1KiB from the 1MiB boundary

  o devices are self descriptive to the extent that they can be in
    terms of erase block sizes and so on

Staged Execution

  o Essential initialization, SDRAM, memory controller, IO multiplexing
  o Move loader into storage for execution, if necessary.  May involve
    transfer from secondary medium: serial, NAND flash, serial eeprom
  o Finish hardware initialization
  o Setup stack
  o Clear BSS
  o Command loop

Configuration parameters

  o Initial base address (turns out not to be necessary on ARM)
  o Execution base address
  o Base RAM address for 4K stack, data, BSS
  o Layout of SDRAM memory

Parameters

  o APEX_STACK - Linker calculated top of stack
  o APEX_VMA_START - Execution (virtual) memory address for image
  o APEX_VMA_END   - Linker calculated end of the execution memory image
  o APEX_BSS_START - Linker calculated start of BSS region
  o APEX_BSS_END - Linker calculated End of BSS region

Symbols

  o init () - driver initialization and then loader exec () 
  o reset () - initial entry point
  o exception_error () - function called for unused exceptions
  o relocate_apex () - move loader into RAM from...whereever 
  o initialize_bootstrap () - pre-relocate initialization
  o initialize_target () - completion of target (non-driver) initialization
  o setup_c () - prepare C exection environment
  o after setup_c() we can use traditional C code for the rest of
    initialization

Drivers

  o read/write, block/byte
  o erase
  o status/info
  o init/detect -> probe()
  o exit -> release()

Basic commands

  o display
  o verify
  o checksum
  o copy
  o boot (linux kernel)
  o go (arbitrary execution)
  o printenv/setenv

Sections

  o .entry - guaranteed first section.  Should contain only one symbol
  o .bootstrap - guaranteed to follow .entry.  Smallest possible
    section of code to handle bootstrap into RAM, if necessary
  o .text - rest of loader code
  o .init - initialization hooks, especially drivers
  o .env - environment hooks
  o .exit - hooks to call before leaving loader
  o .cmd - command functions
  o .bss - BSS
  o .data - initialized data

Configuration

  CONFIG_ARM - ARM target
  CONFIG_LH7952X
  CONFIG_LH79520
  CONFIG_LH79524
  CONFIG_LPD79520
  CONFIG_LPD79524
  CONFIG_LH7A40X
  CONFIG_LH7A400
  CONFIG_LH7A404
  CONFIG_LPD7A400
  CONFIG_LPD7A404
  CONFIG_CONSOLE_DEVICE - Defines which serial (or other) device is console

Kernel Build Scripts

 o The directory Makefiles may be invoked in two different situations.
   Actually, only the primary one, for the architecture, is like that.
   It may be included in the top-level Makefile in which case it is
   used to configure how the kernel will be built.  It may then be
   invoked as a Makefile used to build targets in that directory.  This
   can be confusing when it needs to define targets that must not
   override the default for the top-level Makefile.  The kernel doesn't
   need to worry about this, I believe, because it does not depend on
   the arch/$(ARCH) directory to build a target necessary for the
   vmlinux file.  Instead, it only calls arch/$(ARCH)/Makefile to build
   boot targets.

Weak Symbols

 o Using weak symbols in the entry code would make it easy to replace
   startup features with a platform specific implementation.  Trouble
   is, weak symbols cannot be called with a simple "bl" because the
   compiler and linker are not going to know where the code is and may
   not be confident that it will be nearby.  I wonder if I can hint
   that the routine must be close?
 o Libraries solve this problem as can be seen with the
   relocate_apex () function (NAND version).

Code Size

 o The printf code is really big.  We could probably use a smaller
   printf function and save some code.  It's about 4K right now.
   There is a smaller version online.  I'd like to review uClib before
   depending on code found on the net.

Toolchain

 o The 3.3 compiler that has worked so well for so long, seems to have
   a problem with the printf() function.  The call

     printf ("read done\r\n");

   is automatically translated as puts which should be OK, except that
   it doesn't copy the string correctly.  The final \n is truncated
   from the constant pool.  Adding -fno-builtin-printf fixes the
   problem.

Glitches

 o Every now and then, the NOR flash gets wedged.  Power cycling is
   not enough.  In this most recent case, it because corrupt and would
   not accept erase commands.  I sent a couple of commands to the
   device, switched to the other bank, and finally unlock/erase
   worked.  I think it might be wise to operate the VPEN line in the
   flash driver.
   o I think that this might be part of a problem with writing to
     flash when VPEN is enabled as we don't intend to do so.

NOR

 o Driver is kinda big and there isn't even a write function.
 o Can we aggregate some of the common driver logic, descriptors and all?
   o some of this has been done with good effect
 o Available and start address calc can be handled by a function for
   all cases...maybe.  There is a difference between blocking and
   banking.  Perhaps it would be good to limit all operations to
   blocks so that the code can be identical in all cases.

Environment and Startup

 o There is a single environment variable that contains a startup
   command.  This will be executed when the loader starts.
   o  done
 o There is a timed delay function that will wait the specified number
   of seconds (tenths) before continuing to execute the script
   command line.

     copy nor:0x400000#0x100000 0x20008000 ; \
     wait 50 Automatic boot in 5 seconds ; \
     boot

 o Pressing a key will discontinue the script execution

Spinner/Progress

 o would be nice to have a standard UI call to emit progress
   feedback.  Erasing, for example, takes quite a bit of time.
   o done
 o The performance of each driver varies.  It might be handy to have a
   stepping value for each driver when writing so that we can properly
   scale them.  Or, we could add code to insert the current byte count
   into the spinner code and let it determine when to write.

NAND Flash Drivers

  o There is a need to support the LPD method of controlling the NAND
    flash as well as the Sharp method.
  o The LPD method is implemented and tested.
  o The Sharp method should be OK as long as we don't want to use
    CompactFlash or large NOR flash.  CF would be OK as long as the
    implementation didn't require the A23 line as Logic does.
    Refer to CONFIG_NAND_LPD.
  o I am unable to erase the second flash block.  I don't know if this
    is a new problem, or if this has always been the case.  The
    datasheet gives me the impression that this is a problem with the
    chip enable.
    o This was a problem with the math 
  o relocate_apex for NAND flash 
    o This is finally working reliably.  It really won't work without
    the CPLD, but it close.
  o At this point, the relocate_nand function (should) work regardless
    of whether or not the board is running with a Logic style CPLD.
    Kurt is not coming to my way of thinking which is that there is a
    boot-up mode that mimics the Sharp method and a CPLD configurable
    mode that is faster.
  o Latest CPLD from Kurt works correctly in both modes.  We can now
    perform IO in the Sharp way and in the Logic way.  Relocator
    tested in both as is the drv-nand.  Kernel driver is likely to be
    Logic mode only for now.

Errors

  o It might be handy to have some form of debug output available.  We
    have situations where the system cannot get to the prompt, but it
    might be helpful to know why.  For example, a system is booted
    from NAND, but there is not NAND aware relocation function.
    Generally, I'm OK with little error handling since it is usually
    wasted space.
    o DEBUG_LL

Relocation

  o The relocation code now checks to see if it is already executing
    from the target location. If so, it will return immediately.
  o The relocation code, too, will not restart the loader, but must
    always continue with a proper return.  If we only have the NAND
    relocator, then we have no way of moving the loader in SDRAM if it
    has been put in SRAM or SDRAM at some random location.  This may
    be a shortcoming.

Performance

  o The load time from flash to SDRAM is partially goverened by the
    EMC controller setup.  It may be desirable to make sure that the
    EMC is optimally configured before relocate_apex.

UART Initialization

  o It is OK to defer critical UART initializations to the serial
    driver.
  o However, the kernel usually expects that the UART is available
    during startup and it may be possible to disable the serial driver
    in APEX in order to save space.

Interrupt Handling

  o It should be possible to catch interrupts in the boot loader. 
  o Most of the code is implemented (for one platform) and enabled
    with CONFIG_INTERRUPTS.
  o The key piece missing is that the interrupt vectors must either be
    written to flash, or the MMU must be enabled so that the page with
    the vectors is available at address zero.
  o Untrue, we can use the RCPC to put SDRAM (or SRAM) at 0x00000000. 

Console

  o Document how to setup console on the LCD device as well as the
    serial:

      console=tty0 console=ttyAM0,115200

Ethernet

  o There is an underlying driver interface
    o send/receive ... poll? 
    o prefer not to have to implement interrupts for the sake of
      complexity
  o Protocol handler will have a discriminator?  Or will there be only
    one available at a time.
  o Need to support multiple streams since we will eventually support
    telnet.  
  o Select style interface will be best to handle console as well as
    network streams.
  o Handler will bind with some sort of discriminator.  Could be a
    port or protocol or perhaps a frame interpretation function.
  o UI will need to support ip command
    o Model after ip route
    o ip addr ...
    o ip link for mac addresses and replace emac?   Hmm.
  o Need to support DHCP at least.  RARP is desirable.  Bootp should
    come with dhcp.
  o Really, this should be simple.

  o The challenge is making something sufficiently flexible to
    integrate will into apex and to give reasonable operability.

    o Telnet server
    o TFTP download
    o TFTP server for accessing system data, would accept descriptors
    o Descriptor flow, tftp receive to a descriptor
    o HTTP/FTP server should also be possible to retrieve information
      about the running system.
    o TCP state engine is more difficult and may be too much

  o I prefer a very simple dispatch so that there is little admin.
    -> slow call structure for packet dispatch
    -> polling for packets

