Memisc                package:memisc                R Documentation

_I_n_t_r_o_d_u_c_t_i_o_n _t_o _t_h_e _m_e_m_i_s_c _P_a_c_k_a_g_e

_D_e_s_c_r_i_p_t_i_o_n:

     This package collects an assortment of tools that are intended to
     make work with 'R' easier for the author of this package and are
     submitted to the public in the hope that they will be also be
     useful to others.

     The tools in this package can be grouped into five major
     categories:

        *  Data preparation and management

        *  Data analysis

        *  Presentation of analysis results

        *  Simulation

        *  Programming

_D_a_t_a _p_r_e_p_a_r_a_t_i_o_n _a_n_d _m_a_n_a_g_e_m_e_n_t:

     *Survey Items*

     'memisc' provides facilities to work with what users from other
     packages like SPSS, SAS, or Stata know as `variable labels',
     `value labels' and `user-defined missing values'. In the context
     of this package these aspects of the data are represented by the
     '"description"', '"labels"', and '"missing.values"' attributes of
     a data vector.  These facilities are useful, for example, if you
     work with survey data that contain coded items like vote intention
     that may have the following structure:

     Question: ``If there was a parliamentary election next tuesday,
     which party would you vote for?''

        1  Conservative Party
        2  Labour Party
        3  Liberal Democrat Party
        4  Scottish Nation Party
        5  Plaid Cymru
        6  Green Party
        7  British National Party
        8  Other party
       96  Not allowed to vote
       97  Would not vote
       98  Would vote, do not know yet for which party
       99  No answer

     A statistical package like SPSS allows to attach labels like
     `Conservative Party', `Labour Party', etc. to the codes 1,2,3,
     etc. and to mark mark the codes 96, 97, 98, 99 as `missing' and
     thus to exclude these variables from statistical analyses.
     'memisc' provides similar facilities. Labels can be attached to
     codes by calls like 'labels(x) <- something' and expendanded by
     calls like 'labels(x) <- labels(x) + something', codes can be
     marked as `missing' by calls like 'missing.values(x) <- something'
     and 'missing.values(x) <- missing.values(x) + something'.

     'memisc' defines a class called "data.set", which is similar to
     the class "data.frame". The main difference is that it is
     especially geared toward containing survey item data.
     Transformations of and within "data.set" objects retain the
     information about value labels, missing values etc. Using
     'as.data.frame' sets the data up for _R_'s statistical functions,
     but doing this explicitely is seldom necessary. See 'data.set'.

     *More Convenient Import of External Data*

     Survey data sets are often relative large and contain up to a few
     thousand variables. For specific analyses one needs however only a
     relatively small subset of these variables. Although modern
     computers have enough RAM to load such data sets completely into
     an R session, this is not very efficient having to drop most of
     the variables after loading. Also, loading such a large data set
     completely can be time-consuming, because R has to allocate space
     for each of the many variables. Loading just the subset of
     variables really needed for an analysis is more efficient and
     convenient - it tends to be much quicker. Thus this package
     provides facilities to load such subsets of variables, without the
     need to load a complete data set. Further, the loading of data
     from SPSS files is organized in such a way that all informations
     about variable labels, value labels, and user-defined missing
     values are retained. This is made possible by the definition of
     'importer' objects, for which a 'subset' method exists. 'importer'
     objects contain only the information about the variables in the
     external data set but not the data. The data itself is loaded into
     memory when the functions 'subset' or 'as.data.set' are used.

     *Recoding*

     'memisc' also contains facilities for recoding survey items.
     Simple recodings, for example collapsing answer categories, can be
     done using the function 'recode'. More complex recodings, for
     example the construction of indices from multiple items, and
     complex case distinctions, can be done using the function 'cases'.
     This function may also be useful for programming, in so far as it
     is a generalization of 'ifelse'.

     *Code Books*

     There is a function 'codebook' which produces a code book of an
     external data set or an internal "data.set" object. A codebook
     contains in a conveniently formatted way concise information about
     every variable in a data set, such as which value labels and
     missing values are defined and some univariate statistics.

     An extended example of all these facilities is contained in the
     vignette "anes48", and in 'demo(anes48)'

_D_a_t_a _A_n_a_l_y_s_i_s:

     *Tables and Data Frames of Descriptive Statistics*

     'genTable' is a generalization of 'xtabs': Instead of counts, also
     descriptive statistics like means or variances can be reported
     conditional on levels of factors. Also conditional percentages of
     a factor can be obtained using this function.

     In addition a 'formula' method for the 'aggregate' generic
     function is provided, see. It has the same syntax as 'genTable',
     but gives a data frame of descriptive statistics instead of a
     'table' object. 

     *Per-Subset Analysis*

     'By' is a variant of the standard function 'by': Conditioning
     factors are specified by a formula and are obtained from the data
     frame the subsets of which are to be analysed. Therefore there is
     no need to 'attach' the data frame or to use the dollar operator.

     *Graphical Model Comparison*

     'Termplot' is a variant and an extension of 'termplot': The plots
     are similar to those of 'termplot' but uses 'lattice' graphics.
     Also 'Termplot' can be used on more than one model and allows to
     compare the fit of linear or non-linear effect specifications of
     different models.

     Use 'example(Termplot)' or 'demo(Termplot)' for an example.

_P_r_e_s_e_n_t_a_t_i_o_n _o_f _R_e_s_u_l_t_s _o_f _S_t_a_t_i_s_t_i_c_a_l _A_n_a_l_y_s_i_s:

     *Publication-Ready Tables of Coefficients*

     Journals of the Political and Social Sciences usually require that
     estimates of regression models are presented in the following
     form:


         ==================================================
                         Model 1     Model 2     Model 3
         --------------------------------------------------
         Coefficients
         (Intercept)     30.628***    6.360***   28.566***
                         (7.409)     (1.252)     (7.355)
         pop15           -0.471**                -0.461**
                         (0.147)                 (0.145)
         pop75           -1.934                  -1.691
                         (1.041)                 (1.084)
         dpi                          0.001      -0.000
                                     (0.001)     (0.001)
         ddpi                         0.529*      0.410*
                                     (0.210)     (0.196)
         --------------------------------------------------
         Summaries
         R-squared         0.262       0.162       0.338
         adj. R-squared    0.230       0.126       0.280
         N                50          50          50
         ==================================================

     Such tables of coefficient estimates can be produced by 'mtable'.
     To see some of the possibilities of this function, use
     'example(mtable)'.

     *LaTeX Representation of R Objects*

     Output produced by 'mtable' can be transformed into LaTeX tables
     by an appropriate method of the generic function 'toLatex' which
     is defined in the package 'utils'. In addition, 'memisc' defines
     'toLatex' methods for matrices and 'ftable' objects. Note that
     results produced by 'genTable' can be coerced into 'ftable'
     objects. Also, a default method for the 'toLatex' function is
     defined which coerces its argument to a matrix and applies the
     matrix method of 'toLatex'.

_S_i_m_u_l_a_t_i_o_n:

     The 'memisc' package defines a function 'Simulate', which can be
     used to conduct simulation experiments: For a given number of
     replications and given sets of parameters (which can be varied
     across experimental conditions) data are generated and can
     summarized afterwards by other methods.

     Use 'example(Simulate)', 'demo(monte.carlo)',
     'demo(lm.monte.carlo)', 'demo(random.walk)', or 'demo(schellings)'
     for examples.

_P_r_o_g_r_a_m_m_i_n_g:

     *Looping over Variables*

     Sometimes users want to contruct loops that run over variables
     rather than values. For example, if one wants to set the missing
     values of a battery of items. For this purpose, the package
     contains the function 'foreach'. To set 8 and 9 as missing values
     for the items 'knowledge1', 'knowledge2', 'knowledge3', one can
     use


         foreach(x=c(knowledge1,knowledge2,knowledge3),
             missing.values(x) <- 8:9)

     *Changing Names of Objects and Labels of Factors*

     'R' already makes it possible to change the names of an object.
     Substituting the 'names' or 'dimnames' can be done with some
     programming tricks. This package defines the function 'rename',
     'dimrename', 'colrename', and 'rowrename' that implement these
     tricks in a convenient way, so that programmers (like the author
     of this package) need not reinvent the weel in every instance of
     changing names of an object.

     *Dimension-Preserving Versions of 'lapply' and 'sapply'*

     If a function that is involved in a call to  'sapply' returns a
     result an array or a matrix, the dimensional information gets
     lost. Also, if a list object to which 'lapply' or 'sapply' are
     applied have a dimension attribute, the result looses this
     information. The functions 'Lapply' and 'Sapply' defined in this
     package preserve such dimensional information.

     *Combining Vectors and Arrays by Names*

     The generic function 'collect' collects several objects of the
     same mode into one object, using their names, 'rownames',
     'colnames' and/or 'dimnames'. There are methods for atomic
     vectors, arrays (including matrices), and data frames. For example


     a <- c(a=1,b=2)
     b <- c(a=10,c=30)
     collect(a,b)

     leads to


        x  y
     a  1 10
     b  2 NA
     c NA 30

     *Reordering of Matrices and Arrays*

     The 'memisc' package includes a 'reorder' method for arrays and
     matrices. For example, the matrix method by default reorders the
     rows of a matrix according the results of a function.

_O_t_h_e_r _F_a_c_i_l_i_t_i_e_s:

     'mkHelpDir' is a modification of 'make.packages.html' that
     constructs HTML help pages permanently instead of temporarily.

