sbfit               package:scaleboot               R Documentation

_F_i_t_t_i_n_g _M_o_d_e_l_s _t_o _B_o_o_t_s_t_r_a_p _P_r_o_b_a_b_i_l_i_t_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     'sbfit' is used to fit parametric models to multiscale bootstrap
     probabilities by the maximum likelihood method.

_U_s_a_g_e:

     sbfit(x, ...)

     ## Default S3 method:
     sbfit(x,nb,sa,models=NULL,nofit=FALSE,...)

     ## S3 method for class 'matrix':
     sbfit(x,nb,sa,models=NULL,names.hp=rownames(x),
           nofit=FALSE,cluster=NULL,...)

     ## S3 method for class 'data.frame':
     sbfit(x,...)

     ## S3 method for class 'scaleboot':
     sbfit(x,models=names(x$fi),...)

     ## S3 method for class 'scalebootv':
     sbfit(x,models=attr(x,"models"),...)

_A_r_g_u_m_e_n_t_s:

       x: an object used to select a method. For 'sbfit.default', 'x'
          is denoted by 'nb' and is a vector of bootstrap probabilities
          for a hypothesis. For 'sbfit.matrix', 'x' is denoted by 'bps'
          and is a matrix with row vectors of 'bp' for several
          hypotheses. 

      nb: vector of numbers of bootstrap replicates. A short vector (or
          scalar) is cyclically extended to match the size of 'bp'.

      sa: vector of scales in sigma squared (sigma^2). Should be the
          same size as 'bp'.

  models: character vector of model names. Valid model names are
          'poly.m' for m>=1 and 'sing.m' for m>=3. The default is set
          by 'sboptions()$models', whose default is
          c("poly.1","poly.2","poly.3","sing.3","sphe.3"). If 'models'
          is an integer value, 'sbmodelnames(m=models)' is used.

   nofit: logical. If TRUE, fitting is not performed.

names.hp: character vector of hypotheses names.

 cluster: 'snow' cluster object which may be generated by function
          'makeCluster'.

     ...: further arguments passed to and from other methods.

_D_e_t_a_i_l_s:

     'sbfit.default' fits parametric models to 'bp' by maximizing the
     log-likelihood value of a binomial model. A set of multiscale
     bootstrap resampling should be performed before a call to 'sbfit'
     for preparing 'bp', where 'bp[i]' is a bootstrap probability of a
     hypothesis calculated with a number of bootstrap replicates
     'nb[i]' and a scale sigma^2='sa[i]'. The scale is defined as
     sigma^2=n/n', where n is the sample size of data, and n' is the
     sample size of replicated data for bootstrap resampling.

     Each model specifies a 'psi(beta,s)'=psi(sigma^2 | beta) function
     with a parameter vector beta. The model may describe how the
     bootstrap probability changes along the scale. Let
     'cnt[i]=bp[i]*nb[i]' be the frequency indicating how many times
     the hypothesis of interest is observed in bootstrap replicates at
     scale 'sa[i]'. Then we assume that 'cnt[i]' is binomially
     distributed with number of trials 'nb[i]' and success probability
     '1-\pnorm(psi(beta,s=sa[i])/sqrt(sa[i]))'. Currently, 'sbpsi.poly'
     and 'sbpsi.sing' are available as psi functions. The estimated
     model parameters are accessed by the 'coef.scaleboot' method.

     The model fitting is performed in the order specified in 'models',
     and the initial values for numerical optimization of the
     likelihood function are prepared by using previously estimated
     model parameters. Thus, "poly.(m-1)" should be specified before
     "poly.m", and "poly.(m-1)" and "sing.(m-1)" should be specified
     before "sing.m".

     'sbfit.matrix' calls 'sbfit.default' repeatedly, once for each row
     vector 'bp' of the matrix 'bps'.  Parallel computing is performed
     when 'cluster' is non NULL.

     'sbfit.scaleboot' calls 'sbfit.default' with 'bp', 'nb', and 'sa'
     components in 'x' object for refitting by giving another 'models'
     argument. It discards the previous result of fitting, and
     recomputes the model parameters.

     'sbfit.scalebootv' calls 'sbfit.matrix' with the 'bps', 'nb', and
     'sa' components in the attributes of 'x'.

_V_a_l_u_e:

     'sbfit.default' and 'sbfit.scaleboot' return an object of class
     '"scaleboot"', and 'sbfit.matrix' and 'sbfit.scalebootv' return an
     object of class '"scalebootv"'. 

     An object of class '"scaleboot"' is a list containing at least the
     following components: 

      bp: the vector of bootstrap probabilities used.

      nb: the 'rep(nb,length=length(bp))' used.

      sa: the 'sa' used. 

      fi: list vector of fitted results for 'models' used.  Each list
          consists of components '"par"' (estimated parameter), '"mag"'
          (magnification factor for '"par"' to make the actual
          parameter vector 'beta=par*mag'), '"value"' (maximum
          log-likelihood), '"hessian"' (hessian matrix), '"var"'
          (variance estimate of '"par"'), '"mask"' (logical vector
          indicating parameter elements which are not at boundaries),
          '"init"' (initial values used for optimization), '"psi"' (psi
          function name of the model), '"df"' (degrees of freedom),
          '"rss"' (equivalent to the residual sum of squares, but
          actually defined as 2*(lik0-lik) where lik0 and lik are the
          log-likelihood function of the non-restricted model and the
          model of interest, respectively), '"pfit"' (p-value for
          '"rss"'), '"aic"' (aic value of the model relative to the
          non-restricted model).


     An object of class '"scalebootv"' is a vector of '"scaleboot"'
     objects, and in addition, it has attributes '"models"', '"bps"',
     '"nb"', and '"sa"'.

_A_u_t_h_o_r(_s):

     Hidetoshi Shimodaira <shimo@is.titech.ac.jp>

_R_e_f_e_r_e_n_c_e_s:

     Shimodaira, H. (2002).  An approximately unbiased test of
     phylogenetic tree selection, _Systematic Biology_, 51, 492-508.

     Shimodaira, H. (2004).  Approximately unbiased tests of regions
     using multistep-multiscale bootstrap resampling, _Annals of
     Statistics_, 32, 2616-2641. 

     Shimodaira, H. (2006).  Approximately Unbiased Tests  for Singular
     Surfaces via Multiscale Bootstrap Resampling, Research Report
     B-430, Dept. Mathematical and Computing Sciences, Tokyo Institute
     of Technology, Tokyo.

_S_e_e _A_l_s_o:

     'sbpsi', 'summary.scaleboot', 'plot.scaleboot', 'coef.scaleboot',
     'sbaic'.

_E_x_a_m_p_l_e_s:

     ## Testing a hypothesis
     ## Examples of fitting models to a vector of bp's
     ## mam15.relltest$t4 of data(mam15), but
     ## using a different set of scales (sigma^2 values).
     ## In the below, sigma^2 ranges 0.01 to 100 in sa[i]
     ## This very large range is only for illustration.
     ## Typically, the range around 0.1 to 10
     ## is recommended for much better model fitting.
     ## In other examples, we have used
     ## sa = 9^seq(-1,1,length=13).

     cnt <- c(0,0,0,0,6,220,1464,3565,5430,6477,6754,
              6687,5961) # observed frequencies at scales
     nb <- 100000 # number of replicates at each scale
     bp <- cnt/nb # bootstrap probabilities (bp's)
     sa <- 10^seq(-2,2,length=13) # scales (sigma squared)
     ## model fitting to bp's 
     f <- sbfit(bp,nb,sa) # model fitting ("scaleboot" object)
     f # print the result of fitting
     plot(f,legend="topleft") # observed bp's and fitted curves
     ## approximately unbiased p-values
     summary(f) # calculate and print p-values
     ## refitting with models up to "poly.4" and "sing.4"
     f <- sbfit(f,models=4)
     f # print the result of fitting
     plot(f,legend="topleft") # observed bp's and fitted curves
     summary(f) # calculate and print p-values

     ## Testing multiple hypotheses (only two here)
     ## Examples of fitting models to vectors of bp's
     ## mam15.relltest[c("t1,t2")]
     cnt1 <- c(99982,99247,95068,86876,77802,68562,57842,45529,
         34499,26731,21158,16663,12982) # cnt for "t1"
     cnt2 <- c(0,0,0,0,56,926,3614,7162,10068,11623,
          12432,13361,13518) # cnt for "t2"
     cnts <- rbind(cnt1,cnt2)
     nb <- 100000 # number of replicates at each scale
     bps <- cnts/nb # row vectors are bp's
     sa <- 10^seq(-2,2,length=13) # scales (sigma squared)
     fv <- sbfit(bps,nb,sa) # returns a "scalebootv" object
     fv # print the result of fitting
     plot(fv) # multiple plots
     summary(fv) # calculate and print p-values

