sensitivty             package:accuracy             R Documentation

_D_a_t_a _P_e_r_t_u_r_b_a_t_i_o_n_s _b_a_s_e_d _S_e_n_s_i_t_i_v_i_t_y _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     This function replicates an statistical analysis using slightly
     perturbed versions of the original input data, and then analyzes
     the sensitivity of the model to such changes.  This can be used to
     draw attention to inferences that cannot be supported confidently
     given the current data, model, and algorithm/implementation,

_U_s_a_g_e:

     sensitivity(data, statistic, ..., ptb.R = 50, ptb.ran.gen = NULL , ptb.s = NULL, 
             summarize=FALSE,  keepData = FALSE, ptb.rangen.ismatrix=FALSE)
     perturb(...)

_A_r_g_u_m_e_n_t_s:

    data: data matrix to be perturbed 

statistic: statistic model to be run on data 

     ...: additional arguments to statistical model

   ptb.R: number of replications for perturbation analysis 

ptb.ran.gen: a single function, or a vector of functions to be used to
          perturb each vector see 'PTBi' 

   ptb.s: a size, or vector of sizes, to be used in the vector
          perturbation functions

summarize: if true, return a sensitivity summary, as would
          'summary(sensitivity())'. This  reduces system memory use and
          can significantly speed up the analysis of large datasets. 

keepData: for debugging, store data for each perturbation 

ptb.rangen.ismatrix: If true, expects ptb.ran.gen to be a matrix
          function, for use  with correlated noise structure. See below

_D_e_t_a_i_l_s:

     Sensitivity to numerical inaccuracy, and measurement error is very
     hard to  measure formally. This empirical sensitivity tests draws 
     attention to inferences that cannot be supported confidently given
     the current data, model, and algorithm/implementation,

     The empirical approach works by replicating the original analysis
     while slightly perturbing the original  input data in different
     ways. The sensitivity of the model estimates (e.g.  estimated
     coefficients, standard errors and log-likelihoods) are then
     summarized. 

     The sensitivity analysis cannot be used to prove the accuracy of a
     particular method, but is useful in drawing attention to potential
     problems. Further experimentation and analysis may be necessary to
     determine the specific cause of the problem: numerical
     instability, statistical sensitivity to measurement error, or
     ill-conditioning.

_V_a_l_u_e:

     Returns a list which contains the result  of each model run. Along
     with attributes about the settings used in the perturbations. Use
     'summary' to summarize the results or extract a matrix of of the
     model parameters across the entire set of runs.

_N_o_t_e:

     If ptb.ran.gen is not specified,  then 'PTBdefault' will be used,
     with q=ptb.s, for each variable in the input data. 

     Note "sensitivity" was  originally called "perturb". The name was
     changed to avoid a conflict with another module, introduced
     afterwards.

_A_u_t_h_o_r(_s):

     Micah Altman Micah_Altman@harvard.edu <URL:
     http://www.hmdc.harvard.edu/micah_altman/>

_R_e_f_e_r_e_n_c_e_s:

     Altman, M., J. Gill and M. P. McDonald.  2003.  _Numerical Issues
     in Statistical Computing for the Social Scientist_.  John Wiley &
     Sons. <URL: http://www.hmdc.harvard.edu/numerical_issues/>

_S_e_e _A_l_s_o:

     See Also as 'PTBi', 'sensitivityZelig', 'PTBdefault',
     'PTBdiscrete'

_E_x_a_m_p_l_e_s:

     # Examine the sensitivity of the GLM from Venables & Ripley (2002, p.189)
     # as described in the glm module.
     # 
     # Perturb the two independent variables using +/- 0.025
     #       (relative to the size of each observations)
     # uniformly distributed noise. Dependent variable is not being modified.
     # 
     # Summary should show that estimated coefficients are not substantively affected by noise.

     if (!is.R()){
       # workaround MASS data bug in SPLUS
       require(MASS,quietly=TRUE)
     }

     data(anorexia,package="MASS")
     panorexia = sensitivity(anorexia, glm, Postwt ~ Prewt + Treat + offset(Prewt),
          family=gaussian, 
         ptb.R=100, ptb.ran.gen=list(PTBi,PTBus,PTBus), ptb.s=c(1,.005,.005) )
     summary(panorexia)

     # Use classic longley dataset. The model is numerically unstable, 
     # and much more sensitive to noise.  Smaller amounts of noise tremendously
     # alter some of the estimated coefficients:
     # 
     # In this example we are not perturbing the dependent variable (employed) or 
     # the year variable. So we assign then PTBi or NULL in ptb.ran.gen )

     data(longley)
     plongley = sensitivity(longley,lm,Employed~.) # defaults

     # Alternatively, choose specific perturbation functions
     #
     plongley2 = sensitivity(longley,lm,Employed~., ptb.R=100, 
         ptb.ran.gen=c(list(PTBi), replicate(5,PTBus,simplify=FALSE),list(PTBi)), ptb.s=c(1,replicate(5,.001),1))

     # summarizes range
     sp=summary(plongley)
     print(sp)
     plot(sp) # plots boxplots of the distribution of the coefficients under perturbatione

     # models with anova methods can also be summarized this way
     anova(plongley) 

     ## Not run: 
     # plots different replications
     plot(plongley) # plots the perturbed replications individually, pausing in between

     # plots anova results (where applicable)
     plot(anova(plongley))
     ## End(Not run)

     # look in summary object to extract more ...
     names(attributes(sp))

     # print matrix of coefficients from all runs
     coef= attr(sp,"coef.betas.m")
     summary(coef)



     # Example where model does not accept a dataset as an argument...

     # MLE does not accept a dataset as an argument, so need to
     # create a wrapper function
     #
     # 
     if (is.R()) {
          library(stats4)
          mleD<-function(data,lld,...) {
                # construct LL function with embedded data
                f=formals(lld)
                f[1]=NULL
                ll <-function()  {
                   cl=as.list(match.call())
                   cl[1]=NULL
                   cl$data=as.name("data")
                   do.call(lld,cl)
                }
                formals(ll)=f

                # call mle
                mle(ll,...)
          }

          dat=as.data.frame(cbind( 0:10 , c(26, 17, 13, 12, 20, 5, 9, 8, 5, 4, 8) ))
                                                         
          llD<-function(data, ymax=15, xhalf=6)
              -sum(stats::dpois(data[[2]], lambda=ymax/(1+data[[1]]/xhalf), log=TRUE))
          
          
          print(summary(sensitivity(dat, mleD,llD)))
          
     }

     # An example of using correlated noise by supplying a matrix noise function
     # Note that the function can be an anonymous function supplied in the call itself

       
       if (require(MASS,quietly=TRUE)) {
       

         plongleym=sensitivity(longley,lm,Employed~.,
           ptb.rangen.ismatrix=TRUE,
           ptb.ran.gen=
           function(x,size=1){
                  mvrnorm(n=dim(x)[1],mu=rep(0,dim(x)[1]),
                          Sigma=matrix(.9,nrow=dim(x)[1],ncol=dim(x)[1]))*size+x}
         )
         print(summary(plongleym))
       }
       
      

