whatis              package:YaleToolkit              R Documentation

_D_a_t_a _f_r_a_m_e _s_u_m_m_a_r_y

_D_e_s_c_r_i_p_t_i_o_n:

     Summarize the characteristics of variables (columns) in a data
     frame.

_U_s_a_g_e:

     whatis(x, var.name.truncate = 20, type.truncate = 14)

_A_r_g_u_m_e_n_t_s:

       x: a data frame

var.name.truncate: maximum length (in characters) for truncation of
          variable names.  The default is 20; anything less than 12 is
          less than the column label in the resulting data frame and is
          a waste of information.

type.truncate: maximum length (in characters) for truncation of
          variable type; '14' is the full width, but '4' works well if
          space is at a premium.

_D_e_t_a_i_l_s:

     The function 'whatis()' provides a basic examination of some
     characteristics of each variable (column) in a data frame.

_V_a_l_u_e:

     A list of characteristics describing the variables in the data
     frame, 'x'. Each component of the list has 'length(x)' values, one
     for each variable in the data frame 'x'.  

variable.name: from the 'names(x)' attribute, possibly truncated to
          'var.name.truncate' characters in length.

    type: the possibilities include '"pure factor"', '"mixed factor"',
          '"ordered factor"', '"character"', and '"numeric"';
          'whatis()' considers the possibility that a factor or a
          vector could contain character and/or numeric values.  If
          both character and numeric values are present, and if the
          variable is a factor, then it is called a mixed factor.  If
          the levels of a factor are purely character or numeric (but
          not both), it is a pure factor.  Non-factors must then be
          either character or numeric.

 missing: the number of 'NA's in the variable.

distinct.values: the number of distinct values in the variable, equal
          to 'length(table(variable))'.

precision: the number of decimal places of precision.

     min: the minumum value (if numeric) or first value
          (alphabetically) as appropriate.

     max: the maximum value (if numeric) or the last value
          (alphabetically) as appropriate.

_A_u_t_h_o_r(_s):

     John W. Emerson, Walton Green

_R_e_f_e_r_e_n_c_e_s:

     Special thanks to John Hartigan and the students of 'Statistical
     Case Studies' of 2004 for their help troubleshooting and
     developing the function 'whatis()'.

_S_e_e _A_l_s_o:

     See also 'str'.

_E_x_a_m_p_l_e_s:

       mydf <- data.frame(a=rnorm(100),
                          b=sample(c("Cat", "Dog"), 100, replace=TRUE), 
                          c=sample(c("Apple", "Orange", "8"), 100, replace=TRUE),
                          d=sample(c("Blue", "Red"), 100, replace=TRUE))
       mydf$d <- as.character(mydf$d)
       whatis(mydf)

       data(iris)
       whatis(iris)

       data(NewHavenResidential)
       whatis(NewHavenResidential)

