reShape                package:Hmisc                R Documentation

_R_e_s_h_a_p_e _M_a_t_r_i_c_e_s _a_n_d _S_e_r_i_a_l _D_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     If the first argument is a matrix, 'reShape' strings out its
     values and creates row and column vectors specifying the row and
     column each element came from.  This is useful for sending
     matrices to Trellis functions, for analyzing or plotting results
     of 'table' or 'crosstabs', or for reformatting serial data stored
     in a matrix (with rows representing multiple time points) into
     vectors.  The number of observations in the new variables will be
     the product of the number of rows and number of columns in the
     input matrix.  If the first argument is a vector, the 'id' and
     'colvar' variables are used to restructure it into a matrix, with
     NAs for elements that corresponded to combinations of 'id' and
     'colvar' values that did not exist in the data.  When more than
     one vector is given, multiple matrices are created.  This is
     useful for restructuring irregular serial data into regular
     matrices.  It is also useful for converting data produced by
     'expand.grid' into a matrix (see the last example).  The number of
     rows of the new matrices equals the number of unique values of
     'id', and the number of columns equals the number of unique values
     of 'colvar'.

     A different behavior of 'reShape' is achieved when 'base' and
     'reps' are specified.  In that case 'x' must be a list or data
     frame, and those data are assumed to contain one or more
     non-repeating measurements (e.g., baseline measurements) and one
     or more repeated measurements represented by variables named by
     pasting together the character strings in the vector 'base' with
     the integers 1, 2, ..., 'reps'.  The input data are rearranged by
     repeating each value of the baseline variables 'reps' times and by
     transposing each observation's values of one of the set of
     repeated measurements as 'reps' observations under the variable
     whose name does not have an integer pasted to the end.  if 'x' has
     a 'row.names' attribute, those observation identifiers are each
     repeated 'reps' times in the output object.  See the last example.

_U_s_a_g_e:

     reShape(x, ..., id, colvar, base, reps, times=1:reps,
             timevar='seqno')

_A_r_g_u_m_e_n_t_s:

       x: a matrix or vector, or, when 'base' is specified, a list or
          data frame 

     ...: other optional vectors, if 'x' is a vector 

      id: A numeric, character, category, or factor variable containing
          subject identifiers.  Required if 'x' is a vector, ignored
          otherwise. 

  colvar: A numeric, character, category, or factor variable containing
          column identifiers.  'colvar' is using a "time of data
          collection" variable. Required if 'x' is a vector, ignored
          otherwise. 

    base: vector of character strings containing base names of repeated
          measurements 

    reps: number of times variables named in 'base' are repeated.  This
          must be a constant. 

   times: when 'base' is given, 'times' is the vector of times to
          create if you do not want to use consecutive integers
          beginning with 1. 

 timevar: specifies the name of the time variable to create if 'times'
          is given, if you do not want to use 'seqno' 

_D_e_t_a_i_l_s:

     In converting 'dimnames' to vectors, the resulting variables are
     numeric if all elements of the matrix dimnames can be converted to
     numeric, otherwise the corresponding row or column variable
     remains character.  When the 'dimnames' if 'x' have a 'names'
     attribute, those two names become the new variable names.  If 'x'
     is a vector and another vector is also given (in '...'), the
     matrices in the resulting list are named the same as the input
     vector calling arguments.  You can specify customized names for
     these on-the-fly by using e.g. 'reShape(X=x, Y=y, id= , colvar=
     )'.  The new names will then be 'X' and 'Y' instead of 'x' and
     'y'.   A new variable named 'seqnno' is also added to the
     resulting object.  'seqno' indicates the sequential repeated
     measurement number.  When 'base' and 'times' are specified, this
     new  variable is named the character value of 'timevar' and the
     values are given by a table lookup into the vector 'times'.

_V_a_l_u_e:

     If 'x' is a matrix, returns a list containing the row variable,
     the column variable, and the 'as.vector(x)' vector, named the same
     as the calling argument was called for 'x'.  If 'x' is a vector
     and no other vectors were specified as '...', the result is a
     matrix.  If at least one vector was given to '...', the result is
     a list containing 'k' matrices, where 'k' one plus the number of
     vectors in '...'.  If 'x' is a list or data frame, the same type
     of object is returned.

_A_u_t_h_o_r(_s):

     Frank Harrell 
      Department of Biostatistics 
      Vanderbilt University School of Medicine 
      f.harrell@vanderbilt.edu

_S_e_e _A_l_s_o:

     'as.vector', 'matrix', 'dimnames', 'outer', 'table'

_E_x_a_m_p_l_e_s:

     if(.R.) {
       set.seed(1)
       Solder  <- factor(sample(c('Thin','Thick'),200,TRUE),c('Thin','Thick'))
       Opening <- factor(sample(c('S','M','L'),  200,TRUE),c('S','M','L'))
     } else attach(solder[solder$skips > 10, ])
     tab <- table(Opening, Solder)
     tab
     reShape(tab)
     # attach(tab)  # do further processing

     if(!.R.) {
      g <- crosstabs( ~ Solder + Opening, data = solder, subset = skips > 10)
      rowpct <- 100*attr(g,'marginals')$"N/RowTotal"   # compute row pcts
      rowpct

      r <- reShape(rowpct)
      # note names "Solder" and "Opening" came originally from formula
      # given to crosstabs
      r    
      dotplot(Solder ~ rowpct, groups=Opening, panel=panel.superpose, data=r)
     }

     # An example where a matrix is created from irregular vectors
     follow <- data.frame(id=c('a','a','b','b','b','d'),
                          month=c(1, 2,  1,  2,  3,  2),
                          cholesterol=c(225,226, 320,319,318, 270))
     follow
     attach(follow)
     reShape(cholesterol, id=id, colvar=month)
     detach('follow')
     # Could have done :
     # reShape(cholesterol, triglyceride=trig, id=id, colvar=month)

     # Get predictions from a regression model for 2 systematically
     # varying predictors.  Convert the predictions into a matrix, with
     # rows corresponding to the predictor having the most values, and
     # columns corresponding to the other predictor
     # d <- expand.grid(x2=0:1, x1=1:100)
     # pred <- predict(fit, d)
     # reShape(pred, id=d$x1, colvar=d$x2)  # makes 100 x 2 matrix

     # Reshape a wide data frame containing multiple variables representing
     # repeated measurements (3 repeats on 2 variables; 4 subjects)
     set.seed(33)
     n <- 4
     w <- data.frame(age=rnorm(n, 40, 10),
                     sex=sample(c('female','male'), n,TRUE),
                     sbp1=rnorm(n, 120, 15),
                     sbp2=rnorm(n, 120, 15),
                     sbp3=rnorm(n, 120, 15),
                     dbp1=rnorm(n,  80, 15),
                     dbp2=rnorm(n,  80, 15),
                     dbp3=rnorm(n,  80, 15), row.names=letters[1:n])
     options(digits=3)
     w

     u <- reShape(w, base=c('sbp','dbp'), reps=3)
     u
     reShape(w, base=c('sbp','dbp'), reps=3, timevar='week', times=c(0,3,12))

