xYplot                 package:Hmisc                 R Documentation

_x_y_p_l_o_t _a_n_d _d_o_t_p_l_o_t _w_i_t_h _M_a_t_r_i_x _V_a_r_i_a_b_l_e_s _t_o _P_l_o_t _E_r_r_o_r _B_a_r_s _a_n_d _B_a_n_d_s

_D_e_s_c_r_i_p_t_i_o_n:

     A utility function 'Cbind' returns the first argument as a vector
     and combines all other arguments into a matrix stored as an
     attribute called '"other"'.  The arguments can be named (e.g.,
     'Cbind(pressure=y,ylow,yhigh)') or a 'label' attribute may be
     pre-attached to the first argument. In either case, the name or
     label of the first argument is stored as an attribute '"label"' of
     the object returned by 'Cbind'.  Storing other vectors as a matrix
     attribute facilitates plotting error bars, etc., as 'trellis'
     really wants the x- and y-variables to be vectors, not matrices.
     If a single argument is given to 'Cbind' and that argument is a
     matrix with column dimnames, the first column is taken as the main
     vector and remaining columns are taken as '"other"'. A subscript
     method for 'Cbind' objects subscripts the 'other' matrix along
     with the main 'y' vector.

     The 'xYplot' function is a substitute for 'xyplot' that allows for
     simulated multi-column 'y'. It uses by default the 'panel.xYplot'
     and 'prepanel.xYplot' functions to do the actual work. The
     'method' argument passed to 'panel.xYplot' from 'xYplot' allows
     you to make error bars, the upper-only or lower-only portions of
     error bars, alternating lower-only and upper-only bars, bands, or
     filled bands.  'panel.xYplot' decides how to alternate upper and
     lower bars according to whether the median 'y' value of the
     current main data line is above the median 'y' for all 'groups' of
     lines or not.  If the median is above the overall median, only the
     upper bar is drawn. For 'bands' (but not 'filled bands'), any
     number of other columns of 'y' will be drawn as lines having the
     same thickness, color, and type as the main data line.  If
     plotting bars, bands, or filled bands and only one additional
     column is specified for the response variable, that column is
     taken as the half width of a precision interval for 'y', and the
     lower and upper values are computed automatically as 'y' plus or
     minus the value of the additional column variable.

     When a 'groups' variable is present, 'panel.xYplot' will create a
     function in frame 0 called 'Key' that when invoked will draw a key
     describing the 'groups' labels, point symbols, and colors. By
     default, the key is outside the graph.  If 'Key(locator(1))' is
     specified, the key will appear so that its upper left corner is at
     the coordinates of the mouse click.  For R/Lattice the first two
     arguments of 'Key' ('x' and 'y') are fractions of the page,
     measured from the lower left corner, and the default placement is
     at 'x=0, y=1'.

     When 'method="quantile"' is specified, 'xYplot' automatically
     groups the 'x' variable into intervals containing a target of 'nx'
     observations each, and within each 'x' group computes three
     quantiles of 'y' and plots these as three lines. The mean 'x'
     within each 'x' group is taken as the 'x'-coordinate. This will
     make a useful empirical display for large datasets in which
     scatterdiagrams are too busy to see patterns of central tendency
     and variability.  You can also specify a general function of a
     data vector that returns a matrix of statistics for the 'method'
     argument. Arguments can be passed to that function via a list
     'methodArgs'.  The statistic in the first column should be the
     measure of central tendency. Examples of useful 'method' functions
     are those listed under the help file for 'summary.formula' such as
     'smean.cl.normal'.

     'Dotplot' is a substitute for 'dotplot' allowing for a matrix
     x-variable, automatic superpositioning when 'groups' is present,
     and creation of a 'Key' function.  When the x-variable (created by
     'Cbind' to simulate a matrix) contains a total of 3 columns, the
     first column specifies where the dot is positioned, and the last 2
     columns specify starting and ending points for intervals.  The
     intervals are shown using line type, width, and color from the
     trellis 'plot.line' list. By default, you will usually see a
     darker line segment for the low and high values, with the dotted
     reference line elsewhere. A good choice of the 'pch' argument for
     such plots is '3' (plus sign) if you want to emphasize the
     interval more than the point estimate.  When the x-variable
     contains a total of 5 columns, the 2nd and 5th columns are treated
     as the 2nd and 3rd are treated above, and the 3rd and 4th columns
     define an inner line segment that will have twice the thickness of
     the outer segments. In addition, tick marks separate the outer and
     inner segments.  This type of display (an example of which
     appeared in _The Elements of Graphing Data_ by Cleveland) is very
     suitable for displaying two confidence levels (e.g., 0.9 and 0.99)
     or the 0.05, 0.25, 0.75, 0.95 sample quantiles, for example.  For
     this display, the central point displays well with a default
     circle symbol.

     'setTrellis' sets nice defaults for Trellis graphics, assuming
     that the graphics device has already been opened if using
     postscript, etc. By default, it sets panel strips to blank and
     reference dot lines to thickness 1 instead of the Trellis default
     of 2.

     'numericScale' is a utility function that facilitates using
     'xYplot' to plot variables that are not considered to be numeric
     but which can readily be converted to numeric using
     'as.numeric()'.  A good example of this is 'timeDate' variables in
     S-Plus 5 and 6. 'numericScale' converts the variable into an
     ordinary numeric variable.  If it is a 'timeDate' variable, two
     attributes are added to the resulting variable: 'scales.major' and
     'scales.minor'. These are each lists with elements 'at' to specify
     a vector of numeric values for tick marks, and a corresponding
     character vector 'labels' with formatted values (e.g., using time
     or date formats).  When you use such a variable with 'xYplot' and
     do not specify a corresponding 'scales' element, tick marks and
     scale labeling are taken from 'scales.major'.  The 'at' element
     for 'scales.minor' is used by 'panel.xYplot' to add minor tick
     marks. 'numericScale' by default will keep the name of the input
     variable as a 'label' attribute for the new numeric variable.

_U_s_a_g_e:

     Cbind(...)

     xYplot(formula, data = sys.frame(sys.parent()), groups,
            subset, xlab=NULL, ylab=NULL, ylim=NULL,
            panel=panel.xYplot, prepanel=prepanel.xYplot, scales=NULL,
            minor.ticks=NULL, ...)

     panel.xYplot(x, y, subscripts, groups=NULL, 
                  type=if(is.function(method) || method=='quantiles') 
                    'b' else 'p',
                  method=c("bars", "bands", "upper bars", "lower bars", 
                           "alt bars", "quantiles", "filled bands"), 
                  methodArgs=NULL, label.curves=TRUE, abline,
                  probs=c(.5,.25,.75), nx,
                  cap=0.015, lty.bar=1, 
                  lwd=plot.line$lwd, lty=plot.line$lty, pch=plot.symbol$pch, 
                  cex=plot.symbol$cex, font=plot.symbol$font, col=NULL, 
                  lwd.bands=NULL, lty.bands=NULL, col.bands=NULL, 
                  minor.ticks=NULL, col.fill=NULL, ...)

     prepanel.xYplot(x, y, ...)

     Dotplot(formula, data = sys.frame(sys.parent()), groups, subset, 
             xlab = NULL, ylab = NULL, ylim = NULL,
             panel=panel.Dotplot, prepanel=prepanel.Dotplot,
             scales=NULL, ...)

     prepanel.Dotplot(x, y, ...)

     panel.Dotplot(x, y, groups = NULL,
                   pch  = dot.symbol$pch, 
                   col  = dot.symbol$col, cex = dot.symbol$cex, 
                   font = dot.symbol$font, abline, ...)

     setTrellis(strip.blank=TRUE, lty.dot.line=2, lwd.dot.line=1)

     numericScale(x, label=NULL, skip.weekends=FALSE, ...)

_A_r_g_u_m_e_n_t_s:

     ...: for 'Cbind' '...' is any number of additional numeric
          vectors. Unless you are using 'Dotplot' (which allows for
          either 2 or 4 "other" variables) or 'xYplot' with
          'method="bands"', vectors after the first two are ignored. 
          If drawing bars and only one extra variable is given in
          '...', upper and lower values are computed as described
          above. If the second argument to 'Cbind' is a matrix, that
          matrix is stored in the '"other"' attribute and arguments
          after the second are ignored.

          Also can be other arguments to pass to 'labcurve' or 'Key'.
          or extra arguments sent from 'numericScale' to 'axis.time' 

 formula: a 'trellis' formula consistent with 'xyplot' or 'dotplot'  

       x: 'x'-axis variable.  For 'numericScale' 'x' is any vector such
          as 'as.numeric(x)' returns a numeric vector suitable for x-
          or y-coordinates. 

       y: a vector, or an object created by 'Cbind' for 'xYplot'. 'y'
          represents the main variable to plot, i.e., the variable used
          to draw the main lines. For 'Dotplot' the first argument to
          'Cbind' will be the main 'x'-axis variable.   

    data: 

  subset: 

    ylim: 

subscripts: 

  groups: 

    type: 

  scales: 

   panel: 

prepanel: 

    xlab: 

    ylab: see 'trellis.args'.  'xlab' and 'ylab' get default values
          from '"label"' attributes. 

  method: defaults to '"bars"' to draw error-bar type plots.  See
          meaning of other values above.  'method' can be a function. 
          Specifying 'method=quantile',
          'methodArgs=list(probs=c(.5,.25,.75))' is the same as
          specifying 'method="quantile"' without specifying 'probs'. 

methodArgs: a list containing optional arguments to be passed to the
          function specified in 'method' 

label.curves: set to 'FALSE' to suppress invocation of 'labcurve' to
          label primary curves where they are most separated or to draw
          a legend in an empty spot on the panel.  You can also set
          'label.curves' to a list of options to pass to 'labcurve'. 
          These options can also be passed as '...' to 'xYplot'. See
          the examples below. 

  abline: a list of arguments to pass to 'panel.abline' for each panel,
          e.g. 'list(a=0, b=1, col=3)' to draw the line of identity
          using color 3. 

   probs: a vector of three quantiles with the quantile corresponding
          to the central line listed first. By default 'probs=c(.5,
          .25, .75)'. You can also specify 'probs' through
          'methodArgs=list(probs=...)'. 

      nx: number of target observations for each 'x' group (see 'cut2'
          'm' argument). 'nx' defaults to the minimum of 40 and the
          number of points in the current stratum divided by 4. Set
          'nx=FALSE' or 'nx=0' if 'x' is already discrete and requires
          no grouping. 

     cap: the half-width of horizontal end pieces for error bars, as a
          fraction of the length of the 'x'-axis 

 lty.bar: line type for bars 

lwd, lty, pch, cex, font, col: see 'trellis.args'.  These are vectors
          when 'groups' is present, and the order of their elements
          corresponds to the different 'groups', regardless of how many
          bands or bars are drawn. If you don't specify 'lty.bands',
          for example, all band lines within each group will have the
          same 'lty'. 

lty.bands, lwd.bands, col.bands: used to allow 'lty', 'lwd', 'col' to
          vary across the different band lines for different 'groups'.
          These parameters are vectors or lists whose elements
          correspond to the added band lines (i.e., they ignore the
          central line, whose line characteristics are defined by
          'lty', 'lwd', 'col'). For example, suppose that 4 lines are
          drawn in addition to the central line. Specifying
          'lwd.bands=1:4' will cause line widths of 1:4 to be used for
          every group, regardless of the value of 'lwd'.  To vary
          characteristics over the 'groups' use e.g.
          'lwd.bands=list(rep(1,4), rep(2,4))' or 'list(c(1,2,1,2),
          c(3,4,3,4))'. 

minor.ticks: a list with elements 'at' and 'labels' specifying
          positions and labels for minor tick marks to be used on the
          x-axis of each panel, if any. This is intended for 'timeDate'
          variables. 

col.fill: used to override default colors used for the bands in
          method='filled bands'. This is a vector when 'groups' is
          present, and the order of the elements corresponds to the
          different 'groups', regardless of how many bands are drawn. 
          The default colors for 'filled bands' are pastel colors
          matching the default colors superpose.line$col
          (plot.line$col) 

strip.blank: set to 'FALSE' to not make the panel strip backgrounds
          blank  

lty.dot.line: line type for dot plot reference lines (default = 1 for
          dotted; use 2 for dotted) 

lwd.dot.line: line thickness for reference lines for dot plots (default
          = 1)  

   label: a scalar character string to be used as a variable label
          after 'numericScale' converts the  variable to numeric form  

skip.weekends: see 'axis.time' 

_D_e_t_a_i_l_s:

     Unlike 'xyplot', 'xYplot' senses the presence of a 'groups'
     variable and automatically invokes 'panel.superpose' instead of
     'panel.xyplot'. The same is true for 'Dotplot' vs. 'dotplot'.

_V_a_l_u_e:

     'Cbind' returns a matrix with attributes.  Other functions return
     standard 'trellis' results.

_S_i_d_e _E_f_f_e_c_t_s:

     plots, and 'panel.xYplot' creates the 'Key' function in the
     session frame.

_A_u_t_h_o_r(_s):

     Frank Harrell 
      Department of Biostatistics 
      Vanderbilt University 
      f.harrell@vanderbilt.edu 
      Madeline Bauer 
      Department of Infectious Diseases 
      University of Southern California School of Medicine 
      mbauer@usc.edu

_S_e_e _A_l_s_o:

     'xyplot', 'panel.xyplot', 'summarize', 'label', 'labcurve',
     'errbar', 'dotplot',  'reShape', 'setps', 'cut2', 'panel.abline'

_E_x_a_m_p_l_e_s:

     # Plot 6 smooth functions.  Superpose 3, panel 2.
     # Label curves with p=1,2,3 where most separated 
     d <- expand.grid(x=seq(0,2*pi,length=150), p=1:3, shift=c(0,pi)) 
     xYplot(sin(x+shift)^p ~ x | shift, groups=p, data=d, type='l') 
     # Use a key instead, use 3 line widths instead of 3 colors 
     # Put key in most empty portion of each panel
     xYplot(sin(x+shift)^p ~ x | shift, groups=p, data=d, 
            type='l', keys='lines', lwd=1:3, col=1) 
     # Instead of implicitly using labcurve(), put a 
     # single key outside of panels (for S-Plus) or at
     # lower left corner (for R)
     xYplot(sin(x+shift)^p ~ x | shift, groups=p, data=d, 
            type='l', label.curves=FALSE, lwd=1:3, col=1, lty=1:3) 
     Key()       # S-Plus
     Key(0,.1)   # R

     # Show the median and quartiles of height given age, stratified 
     # by sex and race.  Draws 2 sets (male, female) of 3 lines per panel.
     # xYplot(height ~ age | race, groups=sex, method='quantiles')

     # Examples of plotting raw data
     dfr <- expand.grid(month=1:12, continent=c('Europe','USA'), 
                        sex=c('female','male'))
     set.seed(1)
     dfr <- upData(dfr,
                   y=month/10 + 1*(sex=='female') + 2*(continent=='Europe') + 
                     runif(48,-.15,.15),
                   lower=y - runif(48,.05,.15),
                   upper=y + runif(48,.05,.15))

     xYplot(Cbind(y,lower,upper) ~ month,subset=sex=='male' & continent=='USA',
            data=dfr)
     xYplot(Cbind(y,lower,upper) ~ month|continent, subset=sex=='male',data=dfr)
     xYplot(Cbind(y,lower,upper) ~ month|continent, groups=sex, data=dfr); Key() 
     # add ,label.curves=FALSE to suppress use of labcurve to label curves where
     # farthest apart

     xYplot(Cbind(y,lower,upper) ~ month,groups=sex,
                                   subset=continent=='Europe', data=dfr) 
     xYplot(Cbind(y,lower,upper) ~ month,groups=sex, type='b',
                                   subset=continent=='Europe', keys='lines',
                                   data=dfr)
     # keys='lines' causes labcurve to draw a legend where the panel is most empty

     xYplot(Cbind(y,lower,upper) ~ month,groups=sex, type='b', data=dfr,
                                   subset=continent=='Europe',method='bands') 
     xYplot(Cbind(y,lower,upper) ~ month,groups=sex, type='b', data=dfr,
                                   subset=continent=='Europe',method='upper')

     label(dfr$y) <- 'Quality of Life Score'   
     # label is in Hmisc library = attr(y,'label') <- 'Quality...'; will be
     # y-axis label 
     # can also specify Cbind('Quality of Life Score'=y,lower,upper) 
     xYplot(Cbind(y,lower,upper) ~ month, groups=sex,
            subset=continent=='Europe', method='alt bars',
             offset=if(.R.)unit(.1,'inches') else .4, type='b', data=dfr)   
     # offset passed to labcurve to label .4 y units away from curve
     # for R (using grid/lattice), offset is specified using the grid
     # unit function, e.g., offset=unit(.4,'native') or
     # offset=unit(.1,'inches') or unit(.05,'npc')

     # The following example uses the summarize function in Hmisc to 
     # compute the median and outer quartiles.  The outer quartiles are 
     # displayed using "error bars"
     set.seed(111)
     dfr <- expand.grid(month=1:12, year=c(1997,1998), reps=1:100)
     month <- dfr$month; year <- dfr$year
     y <- abs(month-6.5) + 2*runif(length(month)) + year-1997
     s <- summarize(y, llist(month,year), smedian.hilow, conf.int=.5) 
     xYplot(Cbind(y,Lower,Upper) ~ month, groups=year, data=s, 
            keys='lines', method='alt', type='b')
     # Can also do:
     s <- summarize(y, llist(month,year), quantile, probs=c(.5,.25,.75),
                    stat.name=c('y','Q1','Q3')) 
     xYplot(Cbind(y, Q1, Q3) ~ month, groups=year, data=s, 
            type='b', keys='lines') 
     # Or:
     xYplot(y ~ month, groups=year, keys='lines', nx=FALSE, method='quantile',
            type='b') 
     # nx=FALSE means to treat month as a discrete variable

     # To display means and bootstrapped nonparametric confidence intervals 
     # use:
     s <- summarize(y, llist(month,year), smean.cl.boot) 
     s
     xYplot(Cbind(y, Lower, Upper) ~ month | year, data=s, type='b')
     # Can also use Y <- cbind(y, Lower, Upper); xYplot(Cbind(Y) ~ ...) 
     # Or:
     xYplot(y ~ month | year, nx=FALSE, method=smean.cl.boot, type='b')

     # This example uses the summarize function in Hmisc to 
     # compute the median and outer quartiles.  The outer quartiles are 
     # displayed using "filled bands"

     s <- summarize(y, llist(month,year), smedian.hilow, conf.int=.5) 

     # filled bands: default fill = pastel colors matching solid colors
     # in superpose.line (this works differently in R)
     xYplot ( Cbind ( y, Lower, Upper ) ~ month, groups=year, 
          method="filled bands" , data=s, type="l")

     # note colors based on levels of selected subgroups, not first two colors
     xYplot ( Cbind ( y, Lower, Upper ) ~ month, groups=year, 
          method="filled bands" , data=s, type="l",
          subset=(year == 1998 | year == 2000), label.curves=FALSE )

     # filled bands using black lines with selected solid colors for fill
     xYplot ( Cbind ( y, Lower, Upper ) ~ month, groups=year, 
          method="filled bands" , data=s, label.curves=FALSE,
          type="l", col=1, col.fill = 2:3)
     Key(.35,1,col = 2:3) #use fill colors in key

     # A good way to check for stable variance of residuals from ols 
     # xYplot(resid(fit) ~ fitted(fit), method=smean.sdl) 
     # smean.sdl is defined with summary.formula in Hmisc

     # Plot y vs. a timeDate variable x
     # xYplot(y ~ numericScale(x, label='Label for X') | country) 
     # For this example could omit label= and specify 
     #    y ~ numericScale(x) | country, xlab='Label for X'

     # Here is an example of using xYplot with several options
     # to change various Trellis parameters,
     # xYplot(y ~ x | z, groups=v, pch=c('1','2','3'),
     #        layout=c(3,1),     # 3 panels side by side
     #        ylab='Y Label', xlab='X Label',
     #        main=list('Main Title', cex=1.5),
     #        par.strip.text=list(cex=1.2),
     #        strip=function(...) strip.default(..., style=1),
     #        scales=list(alternating=FALSE))

     #
     # Dotplot examples
     #

     s <- summarize(y, llist(month,year), smedian.hilow, conf.int=.5) 

     setTrellis()            # blank conditioning panel backgrounds 
     Dotplot(month ~ Cbind(y, Lower, Upper) | year, data=s) 
     # or Cbind(...), groups=year, data=s

     # Display a 5-number (5-quantile) summary (2 intervals, dot=median) 
     # Note that summarize produces a matrix for y, and Cbind(y) trusts the 
     # first column to be the point estimate (here the median) 
     s <- summarize(y, llist(month,year), quantile,
                    probs=c(.5,.05,.25,.75,.95), type='matrix') 
     Dotplot(month ~ Cbind(y) | year, data=s) 
     # Use factor(year) to make actual years appear in conditioning title strips

     # Dotplot(z ~ x | g1*g2)                 
     # 2-way conditioning 
     # Dotplot(z ~ x | g1, groups=g2); Key()  
     # Key defines symbols for g2

     # If the data are organized so that the mean, lower, and upper 
     # confidence limits are in separate records, the Hmisc reShape 
     # function is useful for assembling these 3 values as 3 variables 
     # a single observation, e.g., assuming type has values such as 
     # c('Mean','Lower','Upper'):
     # a <- reShape(y, id=month, colvar=type) 
     # This will make a matrix with 3 columns named Mean Lower Upper 
     # and with 1/3 as many rows as the original data 

