MSClaio2008              package:nsRFA              R Documentation

_M_o_d_e_l _S_e_l_e_c_t_i_o_n _C_r_i_t_e_r_i_a

_D_e_s_c_r_i_p_t_i_o_n:

     Model selection criteria for the frequency analysis of
     hydrological extremes, from Laio et al (2008).

_U_s_a_g_e:

      MSClaio2008 (sample, dist=c("NORM","LN","GUMBEL","EV2","GEV","P3","LP3"), 
                   crit=c("AIC", "AICc", "BIC", "ADC"))
      ## S3 method for class 'MSClaio2008':
      print (x, digits=max(3, getOption("digits") - 3), ...)
      ## S3 method for class 'MSClaio2008':
      summary (object, ...)
      ## S3 method for class 'MSClaio2008':
      plot (x, ...)

_A_r_g_u_m_e_n_t_s:

  sample: data sample

    dist: distributions: normal '"NORM"', 2 parameter log-normal
          '"LN"', Gumbel '"GUMBEL"', Frechet '"EV2"', Generalized
          Extreme Value '"GEV"', Pearson type III '"P3"', log-Pearson
          type III '"LP3"'

    crit: Model-selection criteria: Akaike Information Criterion
          '"AIC"', Akaike Information Criterion corrected '"AICc"',
          Bayesian Information Criterion '"BIC"', Anderson-Darling
          Criterion '"ADC"'

       x: object of class 'MSClaio2008', output of 'MSClaio2008()'

  object: object of class 'MSClaio2008', output of 'MSClaio2008()'

  digits: minimal number of "significant" digits, see 'print.default'

     ...: other arguments

_D_e_t_a_i_l_s:

     The following lines are extracted from Laio et al. (2008). See the
     paper for more details and references.

     *Model selection criteria*

     The problem of model selection can be formalized as follows: a
     sample of n data, D=(x1, ..., xn), arranged in ascending order is
     available, sampled from an unknown parent distribution f(x);  Nm
     operating models, Mj, j=1, ..., Nm, are used to represent the
     data.  The operating models are in the form of probability
     distributions, Mj = gj(x, O*), with parameters O* estimated from
     the available data sample D.  The scope of model selection is to
     identify the model Mopt which is better suited to represent the
     data, i.e. the model which is closer in some sense to the parent
     distribution f(x). 

     Three different model selection criteria are considered here,
     namely, the Akaike Information Criterion (AIC), the Bayesian
     Information Criterion (BIC), and the Anderson-Darling Criterion
     (ADC). Of the three methods, the first two belong to the category
     of classical literature approaches, while the third derives from a
     heuristic interpretation of the results of a standard
     goodness-of-fit test (see Laio, 2004).

     *Akalike Information Criterion*

     The Akaike information Criterion (AIC) for the j-th operational
     model can be computed as

                     AICj = -2 ln (Lj(O*)) + 2 pj

     where

                      Lj(O*) = prod (gj(xi, O*))

     is the likelihood function, evaluated at the point O = O*
     corresponding to the maximum likelihood estimator of the parameter
     vector O and pj is the number of estimated parameter of the j-th
     operational model. In practice, after the computation of the AICj,
     for all of the operating models, one selects the model with the
     minimum AIC value, AICmin.

     When the sample size, n, is small, with respect to the number of
     estimated parameters, p, the AIC may perform inadequately. In
     those cases a second-order variant of AIC, called AICc, should be
     used:

            AICcj = -2 ln (Lj(O*)) + 2 pj (n/(n - pj - 1))

     Indicatively, AICc should be used when n/p < 40.

     *Bayesian Information Criterion*

     The Bayesian Information Criterion (BIC) for the j-th operational
     model reads

                   BICj = -2 ln (Lj(O*)) + ln(n) pj

     In practical application, after the computation of the BICj, for
     all of the operating models, one selects the model with the
     minimum BIC value, BICmin.

     *Anderson-Darling Criterion*

     The Anderson-Darling criterion has the form:

       ADCj = 0.0403 + 0.116 ((ADj - epsj)/betaj)^(etaj/0.851)

     if 1.2 epsj < ADj,

 ADCj = [0.0403 + 0.116 ((0.2 epsj)/betaj)^(etaj/0.851) (ADj - 0.2 epsj / epsj

     if 1.2 epsj >= ADj, where ADj is the discrepancy measure
     characterizing the criterion, the Anderson-Darling statistic 'A2'
     in 'GOFlaio2004', and epsj, betaj and etaj are
     distribution-dependent coefficients that are tabled by Laio [2004,
     Tables 3 and 5] for a set of seven distributions commonly employed
     for the frequency analysis of extreme events.  In practice, after
     the computation of the ADCj, for all of the operating models, one
     selects the model with the minimum ADC value, ADCmin.

_V_a_l_u_e:

     'MSClaio2008' returns the value of the criteria 'crit' (see
     Details) chosen applied to the 'sample', for every distribution
     'dist'.

     'plot.MSClaio2008' plots the empirical distribution function of
     'sample' (Weibull plotting position) on a log-normal probability
     plot, plots the candidate distributions 'dist' (whose parameters
     are evaluated with the maximum likelihood technique, see
     'MLlaio2004', and highlights the ones chosen by the criteria
     'crit'.)

_N_o_t_e:

     For information on the package and the Author, and for all the
     references, see 'nsRFA'.

_S_e_e _A_l_s_o:

     'GOFlaio2004', 'MLlaio2004'.

_E_x_a_m_p_l_e_s:

     data(FEH1000)

     sitedata <- am[am[,1]==53004, ] # data of site 53004
     serieplot(sitedata[,4], sitedata[,3])
     MSC <- MSClaio2008(sitedata[,4])
     MSC
     summary(MSC)
     plot(MSC)

     sitedata <- am[am[,1]==69023, ] # data of site 69023
     serieplot(sitedata[,4], sitedata[,3])
     MSC <- MSClaio2008(sitedata[,4], crit=c("AIC", "ADC"))
     MSC
     summary(MSC)
     plot(MSC)

     sitedata <- am[am[,1]==83802, ] # data of site 83802
     serieplot(sitedata[,4], sitedata[,3])
     MSC <- MSClaio2008(sitedata[,4], dist=c("GEV", "P3", "LP3"))
     MSC
     summary(MSC)
     plot(MSC)

     # short sample, high positive L-CA
     sitedata <- am[am[,1]==40012, ] # data of site 40012
     serieplot(sitedata[,4], sitedata[,3])
     MSC <- MSClaio2008(sitedata[,4])
     MSC
     summary(MSC)
     plot(MSC)

     # negative L-CA
     sitedata <- am[am[,1]==68002, ] # data of site 68002
     serieplot(sitedata[,4], sitedata[,3])
     MSC <- MSClaio2008(sitedata[,4])
     MSC
     summary(MSC)
     plot(MSC)

