svydesign               package:survey               R Documentation

_S_u_r_v_e_y _s_a_m_p_l_e _a_n_a_l_y_s_i_s.

_D_e_s_c_r_i_p_t_i_o_n:

     Specify a complex survey design.

_U_s_a_g_e:

     svydesign(ids, probs=NULL, strata = NULL, variables = NULL, fpc=NULL,
     data = NULL, nest = FALSE, check.strata = !nest, weights=NULL,pps=FALSE,...)
     ## Default S3 method:
     svydesign(ids, probs=NULL, strata = NULL, variables = NULL, fpc=NULL,
     data = NULL, nest = FALSE, check.strata = !nest, weights=NULL,pps=FALSE,variance=c("HT","YG"),...)
     ## S3 method for class 'imputationList':
     svydesign(ids, probs = NULL, strata = NULL, variables = NULL, 
         fpc = NULL, data, nest = FALSE, check.strata = !nest, weights = NULL, pps=FALSE,
          ...)
     ## S3 method for class 'character':
     svydesign(ids, probs = NULL, strata = NULL, variables = NULL, 
         fpc = NULL, data, nest = FALSE, check.strata = !nest, weights = NULL, pps=FALSE,
         dbtype = "SQLite", dbname, ...)

_A_r_g_u_m_e_n_t_s:

     ids: Formula or data frame specifying cluster ids from largest
          level to smallest level, '~0' or '~1' is a formula for no
          clusters.

   probs: Formula or data frame specifying cluster sampling
          probabilities

  strata: Formula or vector specifying strata, use 'NULL' for no strata

variables: Formula or data frame specifying the variables measured in
          the survey. If 'NULL', the 'data' argument is used.

     fpc: Finite population correction: see Details below

 weights: Formula or vector specifying sampling weights as an
          alternative to 'prob'

    data: Data frame to look up variables in the formula arguments, or
          database table name, or 'imputationList' object, see below

    nest: If 'TRUE', relabel cluster ids to enforce nesting within
          strata

check.strata: If 'TRUE', check that clusters are nested in strata

     pps: '"brewer"' to use Brewer's approximation for PPS  sampling
          without replacement. '"overton"' to use Overton's
          approximation. An object of class 'HR' to use the Hartley-Rao
          approximation. An object of class 'ppsmat' to use the
          Horvitz-Thompson estimator.

  dbtype: name of database driver to pass to 'dbDriver'

  dbname: name of database (eg file name for SQLite)

variance: For 'pps' without replacement, use 'variance="YG"' for the
          Yates-Grundy estimator instead of the Horvitz-Thompson
          estimator

     ...: for future expansion

_D_e_t_a_i_l_s:

     The 'svydesign' object combines a data frame and all the survey
     design information needed to analyse it.  These objects are used
     by the survey modelling and summary functions.  The 'id' argument
     is always required, the 'strata', 'fpc', 'weights' and 'probs'
     arguments are optional. If these variables are specified they must
     not have any missing values.

     By default, 'svydesign' assumes that all PSUs, even those in
     different strata, have a unique value of the 'id' variable. This
     allows some data errors to be detected. If your PSUs reuse the
     same identifiers across strata then set 'nest=TRUE'.

     The finite population correction (fpc) is used to reduce the
     variance when a substantial fraction of the total population of
     interest has been sampled. It may not be appropriate if the target
     of inference is the process generating the data rather than the
     statistics of a particular finite population.

     The finite population correction can be specified either as the
     total population size in each stratum or as the fraction of the
     total population that has been sampled. In either case the
     relevant population size is the sampling units.  That is, sampling
     100 units from a population stratum of size 500 can be specified
     as 500 or as 100/500=0.2.  The exception is for PPS sampling
     without replacement, where the sampling probability (which will be
     different for each PSU) must be used.

     If population sizes are specified but not sampling probabilities
     or weights, the sampling probabilities will be computed from the
     population sizes assuming simple random sampling within strata. 

     For multistage sampling the 'id' argument should specify a formula
     with the cluster identifiers at each stage.  If subsequent stages
     are stratified 'strata' should also be specified as a formula with
     stratum identifiers at each stage.  The population size for each
     level of sampling should also be specified in 'fpc'. If 'fpc' is
     not specified then sampling is assumed to be with replacement at
     the top level and only the first stage of cluster is used in
     computing variances. If 'fpc' is specified but for fewer stages
     than 'id', sampling is assumed to be complete for subsequent
     stages.   The variance calculations for multistage sampling assume
     simple or stratified random sampling within clusters at each stage
     except possibly the last.

     For PPS sampling without replacement it is necessary to specify
     the probabilities for each stage of sampling using the 'fpc'
     arguments, and an overall 'weight' argument should not be given.
     At the moment, multistage or stratified PPS sampling without
     replacement is supported only with 'pps="brewer"', or by giving
     the full joint probability matrix using 'ppsmat'. [Cluster
     sampling is supported by all methods, but not subsampling within
     clusters].  

     The 'dim', '"["', '"[<-"' and na.action methods for
     'survey.design' objects operate on the dataframe specified by
     'variables' and ensure that the design information is properly
     updated to correspond to the new data frame.  With the '"[<-"'
     method the new value can be a 'survey.design' object instead of a
     data frame, but only the data frame is used. See also
     'subset.survey.design' for a simple way to select subpopulations.

     The 'model.frame' method extracts the observed data.

     If the strata with one only PSU are not self-representing (or they
     are, but 'svydesign' cannot tell based on 'fpc') then the handling
     of these strata for variance computation is determined by
     'options("survey.lonely.psu")'.  See 'svyCprod' for details.

     'data' may be a character string giving the name of a table or
     view in a relational database that can be accessed through the
     'DBI' or 'ODBC' interfaces. For DBI interfaces 'dbtype' should be
     the name of the database driver and 'dbname' should be the name by
     which the driver identifies the specific database (eg file name
     for SQLite). For ODBC databases 'dbtype' should be '"ODBC"' and
     'dbname' should be the registed DSN for the database. On the
     Windows GUI, 'dbname=""' will produce a dialog box for interactive
     selection. 

     The appropriate database interface package must already be loaded
     (eg 'RSQLite' for SQLite, 'RODBC' for ODBC).  The survey design
     object will contain only the design meta-data, and actual
     variables will be loaded from the database as needed.  Use 'close'
     to close the database connection and 'open' to reopen the
     connection, eg, after loading a saved object.

     The database interface does not attempt to modify the underlying
     database and so can be used with read-only permissions on the
     database.

     If 'data' is an 'imputationList' object (from the "mitools"
     package), 'svydesign' will return a 'svyimputationList' object
     containing a set of designs. Use 'with.svyimputationList' to do
     analyses on these designs and 'MIcombine' to combine the results.

_V_a_l_u_e:

     An object of class 'survey.design'.

_A_u_t_h_o_r(_s):

     Thomas Lumley

_S_e_e _A_l_s_o:

     'as.svrepdesign' for converting to replicate weight designs,
     'subset.survey.design' for domain estimates,
     'update.survey.design' to add variables.

     'mitools' package for using multiple imputations

     'svyrecvar' and 'svyCprod' for details of variance estimation

     'election' for examples of PPS sampling without replacement.

     <URL: http://faculty.washington.edu/tlumley/survey/> for examples
     of database-backed objects.

_E_x_a_m_p_l_e_s:

       data(api)
     # stratified sample
     dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
     # one-stage cluster sample
     dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
     # two-stage cluster sample: weights computed from population sizes.
     dclus2<-svydesign(id=~dnum+snum, fpc=~fpc1+fpc2, data=apiclus2)

     ## multistage sampling has no effect when fpc is not given, so
     ## these are equivalent.
     dclus2wr<-svydesign(id=~dnum+snum, weights=weights(dclus2), data=apiclus2)
     dclus2wr2<-svydesign(id=~dnum, weights=weights(dclus2), data=apiclus2)

     ## syntax for stratified cluster sample
     ##(though the data weren't really sampled this way)
     svydesign(id=~dnum, strata=~stype, weights=~pw, data=apistrat,
     nest=TRUE)

     ## PPS sampling without replacement
     data(election)
     dpps<- svydesign(id=~1, fpc=~p, data=election_pps, pps="brewer")

     ##database example: requires RSQLite
     ## Not run: 
     library(RSQLite)
     dbclus1<-svydesign(id=~dnum, weights=~pw, fpc=~fpc,
     data="apiclus1",dbtype="SQLite", dbname=system.file("api.db",package="survey"))

     ## End(Not run)

