importers               package:memisc               R Documentation

_O_b_j_e_c_t _O_r_i_e_n_t_e_d _I_n_t_e_r_v_a_c_e _t_o _F_o_r_e_i_g_n _F_i_l_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Importer objects are objects that refer to an external data file.
     Currently only Stata files, SPSS system, portable, and
     fixed-column files are supported.

     Data are actually imported by `translating' an importer file into
     a 'data.set' using 'as.data.set' or 'subset'.

     The 'importer' mechanism is more flexible and extensible than
     'read.spss' and  'read.dta' of package "foreign", as most of the
     parsing of the file headers is done in R. They are also adapted to
     load efficiently large data sets. Most importantly, importer
     objects support the 'labels', 'missing.values', and
     'description's, provided by this package.

_U_s_a_g_e:

     spss.fixed.file(file,
       columns.file,
       varlab.file=NULL,
       codes.file=NULL,
       missval.file=NULL,
       count.cases=TRUE
       )

     spss.portable.file(file,
       varlab.file=NULL,
       codes.file=NULL,
       missval.file=NULL,
       count.cases=TRUE)

     spss.system.file(file,
       varlab.file=NULL,
       codes.file=NULL,
       missval.file=NULL,
       count.cases=TRUE)

     Stata.file(file)

     ## The most important methods for "importer" objects are:
     ## S4 method for signature 'importer':
     subset(x, subset, select, drop = FALSE, ...)

     ## S4 method for signature 'importer':
     as.data.set(x,row.names=NULL,optional=NULL,
                         compress.storage.modes=TRUE,...)

_A_r_g_u_m_e_n_t_s:

       x: an object that inherits from class '"importer"'.

    file: character string; the path to the file containing the data

columns.file: character string; the path to an SPSS/PSPP syntax file
          with a 'DATA LIST FIXED' statement

varlab.file: character string; the path to an SPSS/PSPP syntax file
          with a 'VARIABLE LABELS' statement

codes.file: character string; the path to an SPSS/PSPP syntax file with
          a 'VALUE LABELS' statement

missval.file: character string; the path to an SPSS/PSPP syntax file
          with a 'MISSING VALUES' statement

count.cases: logical; should cases in file be counted? This takes
          effect only if the data file does not already contain
          information about the number of cases.

  subset: a logical vector or an expression containing variables from
          the external data file that evaluates to logical. 

  select: a vector of variable names from the external data file. This
          may also be a named vector, where the names give the names
          into which the variables from the external data file are
          renamed.

    drop: a logical value, that determines what happens if only one
          column is selected. If TRUE and only one column is selected,
          'subset' returns only a single 'item' object and not a
          'data.set'.

row.names: ignored, present only for compatibility.

optional: ignored, present only for compatibility.

compress.storage.modes: logical value; if TRUE floating point values
          are converted to integers if possible without loss of
          information.

     ...: other arguments; ignored.

_D_e_t_a_i_l_s:

     A call to a `constructor' for an importer object, that is,
     'spss.fixed.file', 'spss.portable.file', 'spss.sysntax.file', or
     'Stata.file', causes R to read in the header of the data file
     and/or the syntax files that contain information about the
     variables, such as the columns that they occupy (in case of
     'spss.fixed.file'), variable labels, value labels and missing
     values.

     The information in the file header and/or the accompagnying files
     is then processed to prepare the file for importing. Thus the
     inner structure of an 'importer' object may well vary according to
     what type of file is to imported and what additional information
     is given.

     The 'as.data.set' and 'subset' methods for '"importer"' objects
     internally use the generic functions 'seekData', 'readData', and
     'readSubset', which have methods for the subclasses of
     '"importer"'. These functions are not callable from outside the
     package, however.

_V_a_l_u_e:

     'spss.fixed.file', 'spss.portable.file', 'spss.system.file', and
     'Stata.file' return, respectively, objects of class
     '"spss.fixed.importer"', '"spss.portable.importer"',
     '"spss.system.importer"', or '"Stata.importer"', which, by
     inheritance, are also objects of class '"importer"'.

     Objects of class '"importer"' have at least the following two
     slots:

     ptr: an external pointer

variables: a list of objects of class '"item.vector"' which provides a
          `prototype' for the '"data.set"' set objects returned by the
          'as.data.set' and 'subset' methods for objects of class
          '"importer"' 


     The 'as.data.frame' for 'importer' objects does the actual data
     import and returns a data frame. Note that in contrast to
     'read.spss', the variable names of the resulting data frame will
     be lower case. If long variable names are defined (in case of a
     PSPP/SPSS system file), they take precedence and are _not_ coerced
     to lower case.

_S_e_e _A_l_s_o:

     'codebook', 'description', 'read.spss'

_E_x_a_m_p_l_e_s:

     # Extract American National Election Study of 1948
     nes1948.por <- UnZip("anes/NES1948.ZIP","NES1948.POR",
                          package="memisc")

     # Get information about the variables contained.
     nes1948 <- spss.portable.file(nes1948.por)

     # The data are not yet loaded:
     show(nes1948)

     # ... but one can see what variables are present:
     description(nes1948)

     # Now a subset of the data is loaded:
     vote.socdem.48 <- subset(nes1948,
                   select=c(
                       v480018,
                       v480029,
                       v480030,
                       v480045,
                       v480046,
                       v480047,
                       v480048,
                       v480049,
                       v480050
                       ))

     # Let's make the names more descriptive:
     vote.socdem.48 <- rename(vote.socdem.48,
                       v480018 = "vote",
                       v480029 = "occupation.hh",
                       v480030 = "unionized.hh",
                       v480045 = "gender",
                       v480046 = "race",
                       v480047 = "age",
                       v480048 = "education",
                       v480049 = "total.income",
                       v480050 = "religious.pref"
             )

     # It is also possible to do both
     # in one step:
     # vote.socdem.48 <- subset(nes1948,
     #              select=c(
     #                  vote           = v480018,
     #                  occupation.hh  = v480029,
     #                  unionized.hh   = v480030,
     #                  gender         = v480045,
     #                  race           = v480046,
     #                  age            = v480047,
     #                  education      = v480048,
     #                  total.income   = v480049,
     #                  religious.pref = v480050
     #                  ))


     # We examine the data more closely:
     codebook(vote.socdem.48)

     # ... and conduct some analyses.
     #
     t(genTable(percent(vote)~occupation.hh,data=vote.socdem.48))

     # We consider only the two main candidates.
     vote.socdem.48 <- within(vote.socdem.48,{
       truman.dewey <- vote
       valid.values(truman.dewey) <- 1:2
       truman.dewey <- relabel(truman.dewey,
                   "VOTED - FOR TRUMAN" = "Truman",
                   "VOTED - FOR DEWEY"  = "Dewey")
       })

     summary(truman.relig.glm <- glm((truman.dewey=="Truman")~religious.pref,
         data=vote.socdem.48,
         family="binomial",
     ))

