running                package:gtools                R Documentation

_A_p_p_l_y _a _F_u_n_c_t_i_o_n _O_v_e_r _A_d_j_a_c_e_n_t _S_u_b_s_e_t_s _o_f _a _V_e_c_t_o_r

_D_e_s_c_r_i_p_t_i_o_n:

     Applies a function over subsets of the vector(s) formed by taking
     a fixed number of previous points.

_U_s_a_g_e:

     running(X, Y=NULL, fun=mean, width=min(length(X), 20),
             allow.fewer=FALSE, pad=FALSE, align=c("right", "center","left"),
             simplify=TRUE, by, ...)

_A_r_g_u_m_e_n_t_s:

       X: data vector 

       Y: data vector (optional) 

     fun: Function to apply. Default is 'mean'

   width: Integer giving the number of vector elements to include in
          the subsets.  Defaults to the lesser of the length of the
          data and 20 elements.

allow.fewer: Boolean indicating whether the function should be computed
          for subsets with fewer than 'width' points

     pad: Boolean indicating whether the returned results should be
          'padded' with NAs corresponding to sets with less than
          'width' elements.  This only applies when when 'allow.fewer'
          is FALSE.

   align: One of "right", "center", or "left". This controls the
          relative location of `short' subsets with less then 'width'
          elements: "right" allows short subsets only at the beginning
          of the sequence so that all of the complete subsets are at
          the end of the sequence (i.e. `right aligned'), "left" allows
          short subsets only at the end of the data so that the
          complete subsets are `left aligned', and "center" allows
          short subsets at both ends of the data so that complete
          subsets are `centered'. 

simplify: Boolean.  If FALSE the returned object will be a list
          containing one element per evaluation.  If TRUE, the returned
          object will be coerced into a vector (if the computation
          returns a scalar) or a matrix (if the computation returns
          multiple values). Defaults to FALSE.

      by: Integer separation between groups. If 'by=width' will give
          non-overlapping windows. Default is missing, in which case
          groups will start at each value in the X/Y range.

     ...: parameters to be passed to 'fun' 

_D_e_t_a_i_l_s:

     'running' applies the specified function to a sequential windows
     on 'X' and (optionally) 'Y'.  If 'Y' is specified the function
     must be bivariate.

_V_a_l_u_e:

     List (if 'simplify==TRUE'), vector, or matrix containg the results
     of applying the function 'fun' to the subsets of 'X' ('running')
     or 'X' and 'Y'.

     Note that this function will create a vector or matrix even for
     objects which are not simplified by 'sapply'.

_A_u_t_h_o_r(_s):

     Gregory R. Warnes gregory.r.warnes@pfizer.com, with contributions
     by Nitin Jain nitin.jain@pfizer.com.

_S_e_e _A_l_s_o:

     'wapply' to apply a function over an x-y window centered at each x
     point, 'sapply', 'lapply'

_E_x_a_m_p_l_e_s:

     # show effect of pad
     running(1:20, width=5)
     running(1:20, width=5, pad=TRUE)

     # show effect of align
     running(1:20, width=5, align="left", pad=TRUE)
     running(1:20, width=5, align="center", pad=TRUE)
     running(1:20, width=5, align="right", pad=TRUE)

     # show effect of simplify
     running(1:20, width=5, fun=function(x) x )  # matrix
     running(1:20, width=5, fun=function(x) x, simplify=FALSE) # list

     # show effect of by
     running(1:20, width=5)       # normal
     running(1:20, width=5, by=5) # non-overlapping
     running(1:20, width=5, by=2) # starting every 2nd

     # Use 'pad' to ensure correct length of vector, also show the effect
     # of allow.fewer.
     par(mfrow=c(2,1))
     plot(1:20, running(1:20, width=5, allow.fewer=FALSE, pad=TRUE), type="b")
     plot(1:20, running(1:20, width=5, allow.fewer=TRUE,  pad=TRUE), type="b")
     par(mfrow=c(1,1))

     # plot running mean and central 2 standard deviation range
     # estimated by *last* 40 observations
     dat <- rnorm(500, sd=1 + (1:500)/500 )
     plot(dat)
     sdfun <- function(x,sign=1) mean(x) + sign * sqrt(var(x))
     lines(running(dat, width=51, pad=TRUE, fun=mean), col="blue")
     lines(running(dat, width=51, pad=TRUE, fun=sdfun, sign=-1), col="red")
     lines(running(dat, width=51, pad=TRUE, fun=sdfun, sign= 1), col="red")

     # plot running correlation estimated by last 40 observations (red)
     # against the true local correlation (blue)
     sd.Y <- seq(0,1,length=500)

     X <- rnorm(500, sd=1)
     Y <- rnorm(500, sd=sd.Y)

     plot(running(X,X+Y,width=20,fun=cor,pad=TRUE),col="red",type="s")

     r <- 1 / sqrt(1 + sd.Y^2) # true cor of (X,X+Y)
     lines(r,type="l",col="blue")

