aggragate.formula           package:memisc           R Documentation

_D_a_t_a _F_r_a_m_e_s _a_n_d _T_a_b_l_e_s _o_f _D_e_s_c_r_i_p_t_i_v_e _S_t_a_t_i_s_t_i_c_s

_D_e_s_c_r_i_p_t_i_o_n:

     'aggregate.formula' constructs a data frame of summaries
     conditional on given values of independent variables given by a
     formula. It is a method of the generic function 'aggregate'
     applied to 'formula' objects.

     'genTable' does the same, but produces a 'table'.

     'fapply' is a generic function that dispatches on its 'data'
     argument. It is called internally by 'aggregate.formula' and
     'genTable'. Methods for this function can be used to adapt
     'aggregate.formula' and 'genTable' to data sources other than data
     frames.

_U_s_a_g_e:

     ## S3 method for class 'formula':
     aggregate(x, data=parent.frame(), subset=NULL,
           sort = TRUE, names=NULL, addFreq=TRUE, as.vars=1,
           drop.constants=TRUE,...)

     genTable(formula, data=parent.frame(), subset=NULL,
           names=NULL, addFreq=TRUE,...)

     fapply(formula,data,...) # calls UseMethod("fapply",data)
     ## Default S3 method:
     fapply(formula, data, subset=NULL,
           names=NULL, addFreq=TRUE,...)

_A_r_g_u_m_e_n_t_s:

x, formula: a formula. The right hand side includes one or more
          grouping variables separated by '+'. These may be factors,
          numeric, or character vectors. The left hand side may be
          empty, a numerical variable, a factor, or an expression. See
          details below.

    data: an environment or data frame or an object coercable into a
          data frame.

  subset: an optional vector specifying a subset of observations to be
          used.

    sort: a logical value; determines the order in which the aggregated
          data appear in the data frame returned by
          'aggregate.formula'. If 'sort' is TRUE, then the returned
          data frame is sorted by the values of the grouping variables,
          if 'sort' is FALSE, the order of resulting data frame
          corresponds to the order in which the values of the grouping
          variables appear in the original data frame. 

   names: an optional character vector giving names to the result(s)
          yielded by the expression on the left hand side of 'formula'.
          This argument may be redundant if the left hand side results
          in is a named vector. (See the example below.)

 addFreq: a logical value. If TRUE and 'data' is a table or a data
          frame with a variable named "Freq", a call to 'table',
          'Table', 'percent', or 'nvalid' is supplied by an additional
          argument 'Freq' and a call to 'table' is translated into a
          call to 'Table'. 

 as.vars: an integer; relevant only if the left hand side of the
          formula returns an array or a matrix - which dimension (rows,
          columns, or layers etc.) will transformed to variables?
          Defaults to columns in case of matrices and to the highest
          dimensional extend in case of arrays.

drop.constants: logical; variables that are constant across levels
          dropped from the result?

     ...: further arguments, passed to methods or ignored.

_D_e_t_a_i_l_s:

     If an expression is given as left hand side of the formula, its
     value is computed for any combination of values of the values on
     the right hand side. If the right hand side is a dot, then all
     variables in 'data' are added to the right hand side of the
     formula.

     If no expression is given as left hand side, then the frequency
     counts for the respective value combinations of the right hand
     variables are computed.

     If a single factor is on the left hand side, then the left hand
     side is translated into an appropriate call to 'table()'. Note
     that also in this case 'addFreq' takes effect.

     If a single numeric variable is on the left hand side, frequency
     counts weighted by this variable are computed. In these cases,
     'genTable' is equivalent to 'xtabs' and 'aggregate.formula' is
     equivalent to 'as.data.frame(xtabs(...))'.

_V_a_l_u_e:

     'aggregate.formula' results in a data frame with conditional
     summaries and unique value combinations of conditioning variables.

     'genTable' returns a table, that is, an array with class
     '"table"'.

_S_e_e _A_l_s_o:

     aggregate.data.frame, xtabs

_E_x_a_m_p_l_e_s:

     ex.data <- expand.grid(mu=c(0,100),sigma=c(1,10))[rep(1:4,rep(100,4)),]
     ex.data <- within(ex.data,
                       x<-rnorm(
                         n=nrow(ex.data),
                         mean=mu,
                         sd=sigma
                         )
                       )

     aggregate(~mu+sigma,data=ex.data)
     aggregate(mean(x)~mu+sigma,data=ex.data)
     aggregate(mean(x)~mu+sigma,data=ex.data,name="Average")
     aggregate(c(mean(x),sd(x))~mu+sigma,data=ex.data)
     aggregate(c(Mean=mean(x),StDev=sd(x),N=length(x))~mu+sigma,data=ex.data)
     genTable(c(Mean=mean(x),StDev=sd(x),N=length(x))~mu+sigma,data=ex.data)

     aggregate(table(Admit)~.,data=UCBAdmissions)
     aggregate(Table(Admit,Freq)~.,data=UCBAdmissions)
     aggregate(Admit~.,data=UCBAdmissions)
     aggregate(percent(Admit)~.,data=UCBAdmissions)
     aggregate(percent(Admit)~Gender,data=UCBAdmissions)
     aggregate(percent(Admit)~Dept,data=UCBAdmissions)
     aggregate(percent(Gender)~Dept,data=UCBAdmissions)
     aggregate(percent(Admit)~Dept,data=UCBAdmissions,Gender=="Female")
     genTable(percent(Admit)~Dept,data=UCBAdmissions,Gender=="Female")

