vlmc                  package:VLMC                  R Documentation

_F_i_t _a _V_a_r_i_a_b_l_e _L_e_n_g_t_h _M_a_r_k_o_v _C_h_a_i_n (_V_L_M_C)

_D_e_s_c_r_i_p_t_i_o_n:

     Fit a Variable Length Markov Chain (VLMC) to a discrete time
     series, in basically two steps:
      First a large Markov Chain is generated containing (all if
     'threshold.gen = 1') the context states of the time series.  In
     the second step, many states of the MC are collapsed by _pruning_
     the corresponding context tree.

_U_s_a_g_e:

     vlmc(dts,
          cutoff.prune = qchisq(alpha.c, df=max(.1,alpha.len-1),lower.tail=FALSE)/2,
          alpha.c = 0.05,
          threshold.gen = 2,
          code1char = TRUE, y = TRUE, debug = FALSE, quiet = FALSE,
          dump = 0, ctl.dump = c(width.ct = 1+log10(n), nmax.set = -1) )

     is.vlmc(x)
     ## S3 method for class 'vlmc':
     print(x, digits = max(3, getOption("digits") - 3), ...)

_A_r_g_u_m_e_n_t_s:

     dts: a discrete ``time series''; can be a numeric, character or
          factor.

cutoff.prune: non-negative number; the cutoff used for pruning;
          defaults to half the alpha-quantile of a chisq distribution,
          where alpha = 'alpha.c', the following argument:

 alpha.c: number in (0,1) used to specify 'cutoff.prune' in the more
          intuitive chi^2 quantile scale; defaulting to 5%.

threshold.gen: integer '>= 1' (usually left at 2).  When _generating_
          the initial large tree, only generate nodes with 'count >=
          threshold.gen'.

code1char: logical; if true (default), the data 'dts' will be
          ..........FIXME...........

       y: logical; if true (default), the data 'dts' will be returned. 
          This allows to ensure that residuals ('residuals.vlmc') and
          ``k-step ahead'' predictions can be computed from the result.

   debug: logical; should debugging info be printed to stderr.

   quiet: logical; if true, don't print some warnings.

    dump: integer in '0:2'.  If positive, the pruned tree is dumped to
          stderr; if 2, the initial *un*pruned tree is dumped as well.

ctl.dump: integer of length 2, say 'ctl[1:2]' controlling the above
          dump when 'dump > 0'.  'ctl[1]' is the width (number of
          characters) for the ``counts'', 'ctl[2]' the maximal number
          of set elements that are printed per node; when the latter is
          not positive (by default), currently 'max(6, 15 - log10(n))'
          is used.

       x: a fitted '"vlmc"' object.

  digits: integer giving the number of significant digits for printing
          numbers.

     ...: potentially further arguments [Generic].

_V_a_l_u_e:

     A '"vlmc"' object, basically a list with components 

       n: length of data series when fit.

threshold.gen, cutoff.prune: the arguments (or their defaults).

alpha.len: the alphabet size.

   alpha: the alphabet used, as one string.

    size: a named integer vector of length (>=) 4, giving
          characteristic sizes of the fitted VLMC.  Its named
          components are

          "_o_r_d._M_C" the (maximal) order of the Markov chain,

          "_c_o_n_t_e_x_t" the ``context tree size'', i.e., the number of
               leaves plus number of ``hidden nodes'',

          "_n_r._l_e_a_v_e_s" is the number of leaves, and

          "_t_o_t_a_l" the number of integers needed to encode the VLMC
               tree, i.e., 'length(vlmc.vec)' (see below).

vlmc.vec: integer vector, containing (an encoding of) the fitted VLMC
          tree.

       y: if 'y = TRUE', the data 'dts', as 'character', using the
          letters from 'alpha'.

    call: the 'call' 'vlmc(..)' used.

_N_o_t_e:

     Set 'cutoff = 0, thresh = 1' for getting a ``perfect fit'', i.e. a
     VLMC which perfectly re-predicts the data (apart from the first
     observation).  Note that even with 'cutoff = 0' some pruning may
     happen, for all (terminal) nodes with delta=0.

_A_u_t_h_o_r(_s):

     Martin Maechler

_R_e_f_e_r_e_n_c_e_s:

     Buhlmann P. and Wyner A. (1998) Variable Length Markov Chains.
     _Annals of Statistics_ *27*, 480-513.

     Mchler M. and Bhlmann P. (2003) Variable Length Markov Chains:
     Methodology, Computing and Software. accepted for publication in
     _J. Computational and Graphical Statistics_.

     Mchler M. (2003) VLMC - Implementation and R interface; working
     paper.

_S_e_e _A_l_s_o:

     'draw.vlmc', 'entropy', 'simulate.vlmc' for ``VLMC
     bootstrapping''.

_E_x_a_m_p_l_e_s:

     f1 <- c(1,0,0,0)
     f2 <- rep(1:0,2)
     (dt1 <- c(f1,f1,f2,f1,f2,f2,f1))

     (vlmc.dt1  <- vlmc(dt1))
      vlmc(dt1, dump = 1,
           ctl.dump = c(wid = 3, nmax = 20), debug = TRUE)
     (vlmc.dt1c01 <- vlmc(dts = dt1, cutoff.prune = .1, dump=1))

     data(presidents)
     dpres <- cut(presidents, c(0,45,70, 100)) # three values + NA
     table(dpres <- factor(dpres, exclude = NULL)) # NA as 4th level
     vlmc.pres <- vlmc(dpres, debug = TRUE)
     vlmc.pres

     ## alphabet & and its length:
     vlmc.pres$alpha
     stopifnot(
       length(print(strsplit(vlmc.pres$alpha,NULL)[[1]])) == vlmc.pres$ alpha.len
     )

