BayesianMCMC              package:nsRFA              R Documentation

_B_a_y_e_s_i_a_n _M_C_M_C _f_r_e_q_u_e_n_c_y _a_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     Bayesian Markov Chain Monte Carlo algorithm for flood frequency
     analysis with historical and other information.

_U_s_a_g_e:

      BayesianMCMC (xcont, xhist=NA, infhist=NA, suphist=NA, 
                    nbans=NA, seuil=NA, nbpas=1000, nbchaines=3, 
                    confint=c(0.05, 0.95), dist="GEV",
                    apriori=function(...){1}, 
                    parameters0=NA, varparameters0=NA)
      ## S3 method for class 'BayesianMCMC':
      plot (x, which=1, ask=FALSE, ...)
      ## S3 method for class 'BayesianMCMC':
      print (x, ...)

_A_r_g_u_m_e_n_t_s:

       x: object of class 'BayesianMCMC', output of function
          'BayesianMCMC'

   xcont: vector of systematic data

   xhist: vector of historical data

 infhist: inferior limit for historical data

 suphist: superior limit for historical data

   nbans: period (in years) over which the threshold has been exceeded
          by historical data

   seuil: threshold exceeded by historical data

   nbpas: number of iterations for the MCMC algorithm

nbchaines: number of chains for the MCMC algorithm

 confint: confidence limits for the flood quantiles

    dist: distribution: normal '"NORM"', log-normal with 2 parameters
          '"LN"', Exponential '"EXP"', Gumbel '"GUMBEL"', Generalized
          Extreme Value '"GEV"', Generalized Logistic '"GENLOGIS"',
          Generalized Pareto '"GENPAR"', log-normal with 3 parameters
          '"LN3"', Pearson type III '"P3"', log-Pearson type III
          '"LP3"'

 apriori: function of the parameters of the model `proportional to'
          their a-priori guessed distribution. The default fuction
          returns always 1, i.e. there is no a-priori information

parameters0: initial values of the parameters for the MCMC algorithm

varparameters0: initial values of the parameter variances for the MCMC
          algorithm

   which: a number of a vector of numbers that defines the graph to
          plot (see details)

     ask: if TRUE, the interactive mode is run

     ...: other arguments

_D_e_t_a_i_l_s:

     *Bayesian inference*

     Bayesian inference uses a numerical estimate of the degree of
     belief in a hypothesis before evidence has been observed and
     calculates a numerical estimate of the degree of belief in the
     hypothesis after evidence has been observed. The name `Bayesian'
     comes from the frequent use of Bayes' theorem in the inference
     process. In our case the problem is: which is the probability that
     a frequency function P (of type defined in 'dist') has parameters
     T, given that we have observed the realizations D (defined in
     'xcont', 'xhist', 'infhist', 'suphist', 'nbans', 'seuil'). The
     Bayes' theorem writes

                      P(T|D) = P(D|T)*P(T)/P(D)

     where  P(T|D) is the conditional probability of T, given D (it is
     also called the posterior probability because it is derived from
     or depends upon the specified value of D) and is the result we are
     interested in; P(T) is the prior probability or marginal
     probability of T (`prior' in the sense that it does not take into
     account any information about D), and can be given using the input
     'apriori' (it can be used to account for causal information); 
     P(D|T) is the conditional probability of D given T and it is
     defined choosing 'dist' and depending on the availability of
     historical data; P(D) is the prior or marginal probability of D,
     and acts as a normalizing constant. Intuitively, Bayes' theorem in
     this form describes the way in which one's beliefs about observing
     T are updated by having observed D.

     Since complex models cannot be processed in closed form by a
     Bayesian analysis, namely because of the extreme difficulty in
     computing the normalization factor P(D), simulation-based Monte
     Carlo techniques as the MCMC approaches are used.

     *MCMC Metropolis algorithm*

     Markov chain Monte Carlo (MCMC) methods (which include random walk
     Monte Carlo methods), are a class of algorithms for sampling from
     probability distributions based on constructing a Markov chain
     that has the desired distribution as its equilibrium distribution.
     The state of the chain after a large number of steps is then used
     as a sample from the desired distribution. The quality of the
     sample improves as a function of the number of steps.

     The MCMC is performed here through a simple Metropolis algorithm,
     i.e. a Metropolis-Hastings algorithm with symmetric proposal
     density. The Metropolis-Hastings algorithm can draw samples from
     any probability distribution P(x), requiring only that a function
     proportional to the density can be calculated at x. In Bayesian
     applications, the normalization factor is often extremely
     difficult to compute, so the ability to generate a sample without
     knowing this constant of proportionality is a major virtue of the
     algorithm. The algorithm generates a Markov chain in which each
     state xt+1 depends only on the previous state xt. The algorithm
     uses a Gaussian proposal density N(xt, Sx), which depends on the
     current state xt, to generate a new proposed sample x'. This
     proposal is accepted as the next value xt + 1 = x' if a drawn from
     U(0,1) satisfies

                           a < P(x')/P(xt)

     If the proposal is not accepted, then the current value of x is
     retained (xt+1=xt).

     The Markov chain is started from a random initial value x0 and the
     algorithm is run for many iterations until this initial state is
     forgotten.  These samples, which are discarded, are known as
     burn-in.  The remaining set of accepted values of x represent a
     sample from the distribution P(x). As a Gaussian proposal density
     (or a lognormal one for definite-positive parameters) is used, the
     variance parameter Sx^2 has to be tuned during the burn-in period.
     This is done by calculating the acceptance rate, which is the
     fraction of proposed samples that is accepted in a window of the
     last N samples.  The desired acceptance rate depends on the target
     distribution, however it has been shown theoretically that the
     ideal acceptance rate for a one dimensional Gaussian distribution
     is approx 50%, decreasing to approx 23% for an N-dimensional
     Gaussian target distribution. If Sx^2 is too small the chain will
     mix slowly (i.e., the acceptance rate will be too high, so the
     sampling will move around the space slowly and converge slowly to
     P(x)). If Sx^2 is too large the acceptance rate will be very low
     because the proposals are likely to land in regions of much lower
     probability density. The desired acceptance rate is fixed here to
     34%.

     The MCMC algorithm is based on a code developed by Eric Gaume on
     Scilab. It is still unstable and not all the distributions have
     been tested.

_V_a_l_u_e:

     'BayesianMCMC' returns the following values:

     'parameters' matrix (nbpas)x(nbchaines) with the simulated sets of
     parameters with the MCMC algorithm;

     'parametersML' set of parameters correspondent to the maximum
     likelihood;

     'returnperiods' return periods for which 'quantilesML' and
     'intervals' are calculated;

     'quantilesML' quantiles correspondent to 'returnperiods' for the
     distribution whose parameters are 'parametersML';

     'intervals' confidence intervals for the quantiles 'quantilesML'
     for limits 'confint';

     'varparameters' matrix (nbpas)x(nbchaines)x(number of parameters)
     with the simulated variances for the MCMC algorithm;

     'vraisdist' likelihoods for the sets 'parameters';

     'plot.BayesianMCMC' plots the following figures:

     '1' data as plotting position, fitted distribution (maximum
     likelihood) and confidence intervals;

     '2' diagnostic plot of the MCMC simulation (parameters);

     '3' diagnostic plot of the MCMC simulation (likelyhood and MCMC
     acceptance rate);

     '4' posterior distribution of parameters obtained with the MCMC
     simulation (cloud plots);

     '5' a-priori distribution of parameters (contour plots);

_N_o_t_e:

     For information on the package and the Author, and for all the
     references, see 'nsRFA'.

_S_e_e _A_l_s_o:

     .

_E_x_a_m_p_l_e_s:

     set.seed(2988)
     serie <- rand.GEV(120, xi=40, alfa=20, k=-0.4)
     serie100 <- serie[1:100]
     serie100[serie100 < 250] <- NA
     serie20 <- serie[101:120]
     serie <- c(serie100, serie20)

     plot(serie, type="h", ylim=c(0, 600), xlab="", 
          ylab="Annual flood peaks [m3/s]", lwd=3)
     abline(h=0)
     points(serie100, col=2)

     # Using only sistematic data
     only_sist <- BayesianMCMC (xcont=serie20, xhist=NA, infhist=NA, suphist=NA, 
                                nbans=NA, seuil=NA,
                                nbpas=5000, nbchaines=3, 
                                confint=c(0.05, 0.95), dist="GEV")
     plot(only_sist, which=c(1:3), ask=TRUE, ylim=c(1,600))


     # Adding the information that the threshold 250 m3/s was exceeded 
     #   3 times in the past 100 years
     with_hist_thresh <- BayesianMCMC (xcont=serie20, xhist=NA, infhist=rep(250,3), 
                                       suphist=NA, nbans=100, seuil=250,
                                       nbpas=5000, nbchaines=3, 
                                       confint=c(0.05, 0.95), dist="GEV")
     plot(with_hist_thresh, which=c(1:3), ask=TRUE, ylim=c(1,600))


     # Assuming that the 3 historical events are known with high uncertainty
     with_hist_limits <- BayesianMCMC (xcont=serie20, xhist=NA, 
                                       infhist=c(320,320,250), 
                                       suphist=c(360,400,270), 
                                       nbans=100, seuil=250,
                                       nbpas=5000, nbchaines=3, 
                                       confint=c(0.05, 0.95), dist="GEV")
     plot(with_hist_limits, which=c(1:3), ask=TRUE, ylim=c(1,600))


     # Assuming that the 3 historical events are perfectly known
     with_hist_known <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], 
                                      infhist=NA, suphist=NA, 
                                      nbans=100, seuil=250,
                                      nbpas=5000, nbchaines=3, 
                                      confint=c(0.05, 0.95), dist="GEV")
     plot(with_hist_known, which=c(1:3), ask=TRUE, ylim=c(1,600))



     # Using one reasonable a-priori distribution
     fNORM3 <- function (x) {
      # x = vector of values
      # mu = vector of means
      mu = c(44, 26, -0.40)
      # CM = covariance matrix
      CM = matrix(c(13, 7.8, -0.055,
                    7.8, 15, -0.42,
                    -0.055, -0.42, 0.056), nrow=3, ncol=3)
      CMm1 <- solve(CM)
      term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2)
      term1 <- 1/(2*pi)^(3/2)/sqrt(det(CM))
      term1*term2
     }

     with_hist_known2 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], 
                                       infhist=NA, suphist=NA,
                                       nbans=100, seuil=250,
                                       nbpas=5000, nbchaines=3, apriori=fNORM3,
                                       confint=c(0.05, 0.95), dist="GEV")
     plot(with_hist_known2, 5)
     plot(with_hist_known2, 4)
     plot(with_hist_known, 4)
     plot(with_hist_known)
     plot(with_hist_known2)

     ## Not run: 
     # Using one non-reasonable a-priori distribution
     fNORM3 <- function (x) {
      # x = vector of values
      # mu = vector of means
      mu = c(30, 50, -0.10)
      # CM = covariance matrix
      CM = matrix(c(13, 7.8, -0.055,
                    7.8, 15, -0.42,
                    -0.055, -0.42, 0.056), nrow=3, ncol=3)
      CMm1 <- solve(CM)
      term2 <- exp(-((x - mu) %*% CMm1 %*% (x - mu))/2)
      term2
     }

     with_hist_known3 <- BayesianMCMC (xcont=serie20, xhist=serie100[!is.na(serie100)], 
                                       infhist=NA, suphist=NA,
                                       nbans=100, seuil=250,
                                       nbpas=5000, nbchaines=3, apriori=fNORM3,
                                       confint=c(0.05, 0.95), dist="GEV")
     plot(with_hist_known3, 5)
     plot(with_hist_known3, 4)
     plot(with_hist_known, 4)
     plot(with_hist_known)
     plot(with_hist_known3)
     ## End(Not run)

