data.set               package:memisc               R Documentation

_D_a_t_a _S_e_t _O_b_j_e_c_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     '"data.set"' objects are collections of '"item"' objects, with
     similar semantics as data frames. They are distinguished from data
     frames so that coercion by 'as.data.fame' leads to a data frame
     that contains only vectors and factors. Nevertheless most methods
     for data frames are inherited by data sets, except for the method
     for the 'within' generic function. For the 'within' method for
     data sets, see the details section.

     Thus data preparation using data sets retains all informations
     about item annotations, labels, missing values etc. While (mostly
     automatic) conversion of data sets into data frames makes the data
     amenable for the use of R's statistical functions.

_U_s_a_g_e:

     data.set(...,row.names = NULL, check.rows = FALSE, check.names = TRUE,
         stringsAsFactors = default.stringsAsFactors(),
                      document = NULL)
     as.data.set(x, row.names=NULL, optional=NULL, ...)
     is.data.set(x)
     ## S3 method for class 'data.set':
     as.data.frame(x, row.names = NULL, optional = FALSE, ...)
     ## S4 method for signature 'data.set':
     within(data, expr, ...)

_A_r_g_u_m_e_n_t_s:

     ...: For the 'data.set' function several vectors or items, for
          'within' further, ignored arguments. 

row.names, check.rows, check.names, stringsAsFactors, optional: 
          arguments as in 'data.frame' or 'as.data.frame',
          respectively. 

document: NULL or an optional character vector that contains
          documenation of the data.

       x: for 'is.data.set(x)', any object; for 'as.data.frame(x,...)'
          a "data.set" object.

    data: a data set, that is, an object of class "data.set". 

    expr: an expression, or several expressions enclosed in curly
          braces.

_D_e_t_a_i_l_s:

     The 'as.data.frame' method for data sets is just a copy of the
     method for list. Consequently, all items in the data set are
     coerced in accordance to their 'measurement' setting, see 'item'
     and 'measurement'.

     The 'within' method for data sets has the same effect as the
     'within' method for data frames, apart from two differences: all
     results of the computations are coerced into items if they have
     the appropriate length, otherwise, they are automatically dropped.

     Currently only one method for the generic function 'as.data.set'
     is defined: a method for "importer" objects.

_V_a_l_u_e:

     'data.set' and the 'within' method for data sets returns a
     "data.set" object, 'is.data.set' returns a logical value, and
     'as.data.frame' returns a data frame.

_E_x_a_m_p_l_e_s:

     Data <- data.set(
               vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
               region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
               income = exp(rnorm(300,sd=.7))*2000
               )

     Data <- within(Data,{
       description(vote) <- "Vote intention"
       description(region) <- "Region of residence"
       description(income) <- "Household income"
       wording(vote) <- "If a general election would take place next tuesday,
                         the candidate of which party would you vote for?"
       wording(income) <- "All things taken into account, how much do all
                         household members earn in sum?"
       foreach(x=c(vote,region),{
         measurement(x) <- "nominal"
         })
       measurement(income) <- "ratio"
       labels(vote) <- c(
                         Conservatives         =  1,
                         Labour                =  2,
                         "Liberal Democrats"   =  3,
                         "Don't know"          =  8,
                         "Answer refused"      =  9,
                         "Not applicable"      = 97,
                         "Not asked in survey" = 99)
       labels(region) <- c(
                         England               =  1,
                         Scotland              =  2,
                         Wales                 =  3,
                         "Not applicable"      = 97,
                         "Not asked in survey" = 99)
       foreach(x=c(vote,region,income),{
         annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
         })
       missing.values(vote) <- c(8,9,97,99)
       missing.values(region) <- c(97,99)

       # These to variables do not appear in the
       # the resulting data set, since they have the wrong length.
       junk1 <- 1:5
       junk2 <- matrix(5,4,4)
       
     })
     # Since data sets may be huge, only a
     # part of them are 'show'n
     Data
     # If we insist on seeing all, we can use 'print' instead
     ## Not run: 
       print(Data)
     ## End(Not run)
     str(Data)
     summary(Data)

     Data[[1]]
     Data[1,]
     head(as.data.frame(Data))

     EnglandData <- subset(Data,region == "England")
     EnglandData

     xtabs(~vote+region,data=Data)
     xtabs(~vote+region,data=within(Data, vote <- include.missings(vote)))

