bystats                package:Hmisc                R Documentation

_S_t_a_t_i_s_t_i_c_s _b_y _C_a_t_e_g_o_r_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     For any number of cross-classification variables, 'bystats'
     returns a matrix with the sample size, number missing 'y', and
     'fun(non-missing y)', with the cross-classifications designated by
     rows. Uses Harrell's modification of the 'interaction' function to
     produce cross-classifications.  The default 'fun' is 'mean', and
     if 'y' is binary, the mean is labeled as 'Fraction'.  There is a
     'print' method as well as a 'latex' method for objects created by
     'bystats'. 'bystats2' handles the special case in which there are
     2 classifcation variables, and places the first one in rows and
     the second in columns.  The 'print' method for 'bystats2' uses the
     S-Plus 'print.char.matrix' function to organize statistics for
     cells into boxes.

_U_s_a_g_e:

     bystats(y, ..., fun, nmiss, subset)
     ## S3 method for class 'bystats':
     print(x, ...)
     ## S3 method for class 'bystats':
     latex(object, title, caption, rowlabel, ...)
     bystats2(y, v, h, fun, nmiss, subset)
     ## S3 method for class 'bystats2':
     print(x, abbreviate.dimnames=FALSE,
        prefix.width=max(nchar(dimnames(x)[[1]])), ...)
     ## S3 method for class 'bystats2':
     latex(object, title, caption, rowlabel, ...)

_A_r_g_u_m_e_n_t_s:

       y: a binary, logical, or continuous variable or a matrix or data
          frame of such variables.  If 'y' is a data frame it is
          converted to a matrix. If 'y' is a data frame or matrix,
          computations are done on subsets of the rows of 'y', and you
          should specify 'fun' so as to be able to operate on the
          matrix.  For matrix 'y', any column with a missing value
          causes the entire row to be considered missing, and the row
          is not passed to 'fun'. 

     ...: For 'bystats', one or more classifcation variables separated
          by commas. For 'print.bystats', options passed to
          'print.default' such as 'digits'. For 'latex.bystats', and
          'latex.bystats2', options passed to 'latex.default' such as
          'digits'. If you pass 'cdec' to 'latex.default', keep in mind
          that the first one or two positions (depending on 'nmiss')
          should have zeros since these correspond with frequency
          counts.  

       v: vertical variable for 'bystats2'.  Will be converted to
          'factor'. 

       h: horizontal variable for 'bystats2'.  Will be converted to
          'factor'. 

     fun: a function to compute on the non-missing 'y' for a given
          subset. You must specify 'fun=' in front of the function name
          or definition. 'fun' may return a single number or a vector
          or matrix of any length. Matrix results are rolled out into a
          vector, with names preserved. When 'y' is a matrix, a common
          'fun' is 'function(y) apply(y, 2, ff)' where 'ff' is the name
          of a function which operates on one column of 'y'. 

   nmiss: A column containing a count of missing values is included if
          'nmiss=TRUE' or if there is at least one missing value. 

  subset: a vector of subscripts or logical values indicating the
          subset of data to analyze 

abbreviate.dimnames: set to 'TRUE' to abbreviate 'dimnames' in output

prefix.width: see 'print.char.matrix' if using S-Plus

   title: 'title' to pass to 'latex.default'.  Default is the first
          word of the character string version of the first calling
          argument. 

 caption: caption to pass to 'latex.default'.  Default is the 'heading'
          attribute from the object produced by 'bystats'. 

rowlabel: 'rowlabel' to pass to 'latex.default'.  Default is the
          'byvarnames' attribute from the object produced by 'bystats'.
           For 'bystats2' the default is '""'. 

       x: an object created by 'bystats' or 'bystats2'

  object: an object created by 'bystats' or 'bystats2'

_V_a_l_u_e:

     for 'bystats', a matrix with row names equal to the classification
     labels and column names 'N, Missing, funlab', where 'funlab' is
     determined from 'fun'. A row is added to the end with the summary
     statistics computed  on all observations combined.  The class of
     this matrix is 'bystats'. For 'bystats', returns a 3-dimensional
     array with the last dimension corresponding to statistics being
     computed.  The class of the array is 'bystats2'.

_S_i_d_e _E_f_f_e_c_t_s:

     'latex' produces a '.tex' file.

_A_u_t_h_o_r(_s):

     Frank Harrell 
      Department of Biostatistics 
      Vanderbilt University 
      f.harrell@vanderbilt.edu

_S_e_e _A_l_s_o:

     'interaction', 'cut', 'cut2', 'latex', 'print.char.matrix',
     'translate'

_E_x_a_m_p_l_e_s:

     ## Not run: 
     bystats(sex==2, county, city)
     bystats(death, race)
     bystats(death, cut2(age,g=5), race)
     bystats(cholesterol, cut2(age,g=4), sex, fun=median)
     bystats(cholesterol, sex, fun=quantile)
     bystats(cholesterol, sex, fun=function(x)c(Mean=mean(x),Median=median(x)))
     latex(bystats(death,race,nmiss=FALSE,subset=sex=="female"), digits=2)
     f <- function(y) c(Hazard=sum(y[,2])/sum(y[,1]))
     # f() gets the hazard estimate for right-censored data from exponential dist.
     bystats(cbind(d.time, death), race, sex, fun=f)
     bystats(cbind(pressure, cholesterol), age.decile, 
             fun=function(y) c(Median.pressure   =median(y[,1]),
                               Median.cholesterol=median(y[,2])))
     y <- cbind(pressure, cholesterol)
     bystats(y, age.decile, 
             fun=function(y) apply(y, 2, median))   # same result as last one
     bystats(y, age.decile, fun=function(y) apply(y, 2, quantile, c(.25,.75)))
     # The last one computes separately the 0.25 and 0.75 quantiles of 2 vars.
     latex(bystats2(death, race, sex, fun=table))
     ## End(Not run)

