assoc                  package:vcd                  R Documentation

_E_x_t_e_n_d_e_d _A_s_s_o_c_i_a_t_i_o_n _P_l_o_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     Produce an association plot indicating deviations from a specified
     independence model in a possibly high-dimensional contingency
     table.

_U_s_a_g_e:

     ## Default S3 method:
     assoc(x, row_vars = NULL, col_vars = NULL, compress = TRUE,
       xlim = NULL, ylim = NULL,
       spacing = spacing_conditional(sp = 0), spacing_args = list(),
       split_vertical = NULL, keep_aspect_ratio = FALSE, residuals_type = "Pearson",
       xscale = 0.9, yspace = unit(0.5, "lines"), main = NULL, sub = NULL,
       ..., gp_axis = gpar(lty = 3))
     ## S3 method for class 'formula':
     assoc(formula, data = NULL, subset, na.action, ...,
       main = NULL, sub = NULL)

_A_r_g_u_m_e_n_t_s:

       x: a contingency table in array form with optional category
          labels specified in the 'dimnames(x)' attribute, or an object
          inheriting from the '"ftable"' class (such as '"structable"'
          objects).

row_vars: a vector of integers giving the indices, or a character
          vector giving the names of the variables to be used for the
          rows of the association plot.

col_vars: a vector of integers giving the indices, or a character
          vector giving the names of the variables to be used for the
          columns of the association plot.

compress: logical; if 'FALSE', the space between the rows (columns) are
          chosen such that the _total_ heights (widths) of the rows
          (columns) are all equal.  If 'TRUE', the space between rows
          and columns is fixed and hence the plot is more "compressed".

    xlim: a 2 x k matrix of doubles, k number of total columns of the
          plot.  The columns of 'xlim' correspond to the columns of the
          association plot, the rows describe the column ranges
          (minimums in the first row, maximums in the second row).  If
          'xlim' is 'NULL', the ranges are determined from the
          residuals according to 'compress' (if 'TRUE': widest range
          from each column, if 'FALSE': from the whole association plot
          matrix).

    ylim: a 2 x k matrix of doubles, k number of total rows of the
          plot.  The columns of 'ylim' correspond to the rows of the
          association plot, the rows describe the column ranges
          (minimums in the first row, maximums in the second row).  If
          'ylim' is 'NULL', the ranges are determined from the
          residuals according to 'compress' (if 'TRUE': widest range
          from each row, if 'FALSE': from the whole association plot
          matrix).

 spacing: a spacing object, a spacing function, or a corresponding
          generating function (see 'strucplot' for more information). 
          The default is the spacing-generating function
          'spacing_conditional' that is (by default) called with the
          argument list 'spacing_args' (see 'spacings' for more
          details).

spacing_args: list of arguments for the spacing-generating function, if
          specified (see 'strucplot' for more information).

split_vertical: vector of logicals of length k, where k is the number
          of margins of 'x' (default: 'FALSE'). Values are recycled as
          needed. A 'TRUE' component indicates that the corresponding
          dimension is folded into the columns, 'FALSE' folds the
          dimension into the rows.

keep_aspect_ratio: logical indicating whether the aspect ratio should
          be fixed or not.

residuals_type: a character string indicating the type of residuals to
          be computed. Currently, only Pearson residuals are supported.

  xscale: scale factor resizing the tile's width, thus adding
          additional space between the tiles. 

  yspace: object of class '"unit"' specifying additional space
          separating the rows.

 gp_axis: object of class '"gpar"' specifying the visual aspects of the
          tiles' baseline.

 formula: a formula object with both left and right hand sides
          specifying the column and row variables of the flat table.

    data: a data frame, list or environment containing the variables to
          be cross-tabulated, or a contingency table (see below).

  subset: an optional vector specifying a subset of observations to be
          used. Ignored if 'data' is a contingency table.

na.action: a function which indicates what should happen when the data
          contain 'NA's. Ignored if 'data' is a contingency table.

main, sub: either a logical, or a character string used for plotting
          the main (sub) title.  If logical and 'TRUE', the name of the
          'data' object is used.

     ...: other parameters passed to 'strucplot'

_D_e_t_a_i_l_s:

     Association plots have been suggested by Cohen (1980) and extended
     by Friendly (1992) and provide a means for visualizing the
     residuals of an independence model for a contingency table.

     'assoc' is a generic function and currently has a default method
     and a formula interface.  Both are high-level interfaces to the
     'strucplot' function, and produce (extended) association plots. 
     Most of the functionality is described there, such as
     specification of the independence model, labeling, legend,
     spacing, shading, and other graphical parameters.

     For a contingency table, the signed contribution to Pearson's
     chi^2 for cell {ij... k} is


      d_{ij...k} = (f_{ij...k} - e_{ij...k}) / sqrt(e_{ij...k})


     where f_{ij...k} and e_{ij...k}  are the observed and  expected
     counts corresponding to the cell.  In the association plot, each
     cell is represented by a rectangle that has (signed) height
     proportional to d_{ij...k}  and width proportional to
     sqrt(e_{ij...k}), so that the area of the box is proportional to
     the difference in observed and expected frequencies.  The
     rectangles in each row are positioned relative to a baseline
     indicating independence (d_{ij...k} = 0).  If the observed
     frequency of a cell is greater than the expected one, the box
     rises above the baseline, and falls below otherwise.

     Additionally, the residuals can be colored depending on a
     specified shading scheme (see Meyer et al., 2003).  Package 'vcd'
     offers a range of _residual-based_ shadings (see the shadings help
     page). Some of them allow, e.g., the visualization of test
     statistics.

     Unlike the 'assocplot' function in the 'graphics' package, this
     function allows the visualization of contingency tables with more
     than two dimensions. Similar to the construction of 'flat' tables
     (like objects of class '"ftable"' or '"structable"'), the
     dimensions are folded into rows and columns.

     The layout is very flexible: the specification of shading,
     labeling, spacing, and legend is modularized (see 'strucplot' for
     details).

_V_a_l_u_e:

     The '"structable"' visualized is returned invisibly.

_A_u_t_h_o_r(_s):

     David Meyer David.Meyer@R-project.org

_R_e_f_e_r_e_n_c_e_s:

     A. Cohen (1980), On the graphical display of the significant
     components in a two-way contingency table. _Communications in
     Statistics-Theory and Methods_, *A9*, 1025-1041.

     M. Friendly (1992), Graphical methods for categorical data. _SAS
     User Group International Conference Proceedings_, *17*, 190-200.
     <URL: http://www.math.yorku.ca/SCS/sugi/sugi17-paper.html>

     D. Meyer, A. Zeileis, K. Hornik (2003), Visualizing independence
     using extended association plots. _Proceedings of the 3rd
     International Workshop on Distributed Statistical Computing_, K.
     Hornik, F. Leisch, A. Zeileis (eds.), ISSN 1609-395X. <URL:
     http://www.ci.tuwien.ac.at/Conferences/DSC-2003/Proceedings/>

_S_e_e _A_l_s_o:

     'mosaic', 'strucplot', 'structable'

_E_x_a_m_p_l_e_s:

     data(HairEyeColor)
     ## Aggregate over sex:
     (x <- margin.table(HairEyeColor, c(1, 2)))

     ## Ordinary assocplot:
     assoc(x)
     ## and with residual-based shading (of independence)
     assoc(x, main = "Relation between hair and eye color", shade = TRUE)

     ## Aggregate over Eye color:
     (x <- margin.table(HairEyeColor, c(1, 3)))
     chisq.test(x)
     assoc(x, main = "Relation between hair color and sex", shade = TRUE)

     # Visualize multi-way table
     assoc(aperm(HairEyeColor), expected = ~ (Hair + Eye) * Sex,
           labeling_args = list(just_labels = c(Eye = "left"),
                                offset_labels = c(right = -0.5),
                                offset_varnames = c(right = 1.2),
                                rot_labels = c(right = 0),
                                tl_varnames = c(Eye = TRUE))
     )

     assoc(aperm(UCBAdmissions), expected = ~ (Admit + Gender) * Dept, compress = FALSE,
           labeling_args = list(abbreviate = c(Gender = TRUE), rot_labels = 0)
     )

