nsRFA-package             package:nsRFA             R Documentation

_N_o_n-_s_u_p_e_r_v_i_s_e_d _R_e_g_i_o_n_a_l _F_r_e_q_u_e_n_c_y _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     The estimation of hydrological variables in ungauged basins is a
     very important topic for many purposes, from research to
     engineering applications and water management (see the PUB
     project, Sivapalan et al., 2003).  Regardless of the method used
     to perform such estimation, the underlying idea is to transfer the
     hydrological information from gauged to ungauged sites.  When
     observations of the same variable at different measuring sites are
     available and many data samples are used together as source of
     information, the methods are called regional methods.  The well
     known regional frequency analysis (e.g. Hosking and Wallis, 1997),
     where the interest is in the assessment of the frequency of
     hydrological events, belong to this class of methods.  In
     literature, the main studied variable is the flood peak and the
     most used regional approach is the index-flood method of Dalrymple
     (1960), in which it is implicitly assumed that the flood frequency
     distribution for different sites belonging to a homogeneous region
     is the same except for a site-specific scale factor, the
     index-flood (see Hosking and Wallis, 1997, for details).  Hence,
     the estimation of the flood frequency distribution for an ungauged
     site can be divided into two parts: estimation of the index-flood
     (more in general, the index-value) through linear/non-linear
     relations with climatic and basin descriptors; estimation of the
     adimentional flood frequency distribution, the growth curve,
     assigning the ungauged basin to one homogeneous region.

     'nsRFA' is a collection of statistical tools for objective
     (non-supervised) applications of the Regional Frequency Analysis
     methods in hydrology. This does not mean that Regional Frequency
     Analysis should be non-supervised. These tools are addressed to
     experts, to help their expert judgement. The functions in 'nsRFA'
     allow the application of the index-flood method in the following
     points:

       -  regionalization of the index-value;
       -  formation of homogeneous regions for the growth curves;
       -  fit of a distribution function to the empirical growth curve of each region;

     *Regionalization of the index-value*

     The index-value can be either the sample mean (e.g. Hosking and
     Wallis, 1997) or the sample median (e.g. Robson and Reed, 1999) or
     another scale parameter. Many methodological approaches are
     available for the index-value estimation, and their differences
     can be related to the amount of information available. Excluding
     direct methods, that use information provided by flow data
     available at the station of interest, regional estimation methods
     require ancillary hydrological and physical information. Those
     methods can be divided in two classes: the multiregressive
     approach and the hydrological simulation approach. For both of
     them, the best estimator is the one that optimizes some criterion,
     such as the minimum error, the minimum variance or the maximum
     efficiency. Due to its simplicity, the most frequently used method
     is the multiregressive approach (see e.g. Kottegoda & Rosso, 1998;
     Viglione et al., 2007a), that relates the index-flow to catchment
     characteristics, such as climatic indices, geologic and
     morphologic parameters, land cover type, etc., through linear or
     non-linear equations.

     R provides the functions 'lm' and 'nls' for linear and non-linear
     regressions (package 'stats'). With the package 'nsRFA', a tool to
     select the best linear regressions given a set of candidate
     descriptors, 'bestlm', is provided. In 'REGRDIAGNOSTICS' several
     functions are provided to analyze the output of 'lm', such as: 
     the coefficient of determination (classic and adjusted); the
     Student t significance test; the variance inflation factor (VIF);
     the root mean squared error (RMSE); the mean absolute error (MAE);
     the prediction intervals; predicted values by a jackknife
     (cross-validation) procedure. The functions in 'DIAGNOSTICS'
     provide more general diagnostics of model results (that can be
     also non-linear), comparing estimated values with observed values.

     More details are provided in vignettes:

       'nsRFA_ex01'  How to use the package nsRFA (example 1):
                     Regional frequency analysis of the annual flows in Piemonte
                     and Valle d'Aosta

     that can be accessed via 'vignette("nsRFA_ex01",
     package="nsRFA")'.

     *Formation of homogeneous regions for the growth curves*

     Different techniques exist, for example those that lead to the
     formation of fixed regions through cluster analysis (Hosking and
     Wallis, 1997, Viglione, 2007), or those based on the method of the
     region of influence (ROI, Burn, 1990). The regional procedure can
     be divided into two parts: the formation of regions and the
     assignment of an ungauged site to one of them. Concerning the
     first point, the sites are grouped according to their similarity
     in terms of those basin descriptors that are assumed to explain
     the shape of the growth curve. This shape is usually quantified in
     a parametric way. For instance, the coefficient of variation (CV)
     or the L-CV of the curve can be used for this purpose. The package
     'nsRFA' provide the functions in 'moments' and 'Lmoments' to
     calculate sample moments and L-moments. Subsequently, the selected
     parameter is related with basin descriptors through a linear or a
     more complex model. A regression analysis is performed with
     different combination of descriptors, and descriptors that are
     strongly related with the parameter are used to group sites in
     regions. The same tools used for the regionalization of the index
     value, i.e. 'bestlm', 'REGRDIAGNOSTICS' and 'DIAGNOSTICS', can be
     used if the parametric method is chosen.

     'nsRFA' also provide a non-parametric approach that considers the
     dimensionless growth curve as a whole (see, Viglione et al., 2006;
     Viglione, 2007).  The multiregressive approach can still be used
     if we reason in terms of (dis)similarity between pairs of basins
     in the following way: (1) for each couple of stations, a
     dissimilarity index between non-dimensional curves is calculated
     using a quantitatively predefined metric, for example using the
     Anderson-Darling statistic ('A2'), and organising the distances in
     a matrix with 'AD.dist'; (2) for each basin descriptor, the
     absolute value (or another metric) of the difference between its
     measure in two basins is used as distance between them, using
     'dist' of the package 'stats' to obtain distance matrices; (4) a
     multiregressive approach ('bestlm', 'lm') is applied considering
     the matrices as variables and the basin descriptors associated to
     the best regression are chosen; (5) the significance of the linear
     relationship between distance matrices is assessed through the
     Mantel test with 'mantel.lm'.

     In the suitable-descriptor's space, stations with similar
     descriptor values can be grouped into disjoint regions through a
     cluster analysis (using functions in 'traceWminim') or the ROI
     method can be used adapting a region to the ungauged basin
     ('roi'). In both cases, the homogeneity of the regions can be
     assessed with the functions in 'HOMTESTS', where the Hosking and
     Wallis heterogeneity measures ('HW.tests', see Hosking and Wallis,
     1997) and the Anderson-Darling homogeneity test
     ('ADbootstrap.test', see Viglione et al., 2007b) are provided.

     More details are provided in vignettes:

       'nsRFA_ex01'  How to use the package nsRFA (example 1):
                     Regional frequency analysis of the annual flows in Piemonte
                     and Valle d'Aosta
       'nsRFA_ex02'  How to use the package nsRFA (example 2):
                     Region-Of-Influence approach, some FEH examples

     that can be accessed via 'vignette("nsRFA_ex01",
     package="nsRFA")'.

     *Fit of a distribution function to the empirical growth curve of
     each region*

     Once an homogeneous region is defined, the empirical growth curves
     can be pooled together and a probability distribution can be
     fitted to the pooled sample. The choice of the best distribution
     can be assisted by a Model Selection Criteria with 'MSClaio2008'
     (see, Laio et al., 2008). The parameters of the selected
     distribution can be estimated using the method of moments
     ('moment_estimation'), L-moments ('par.GEV', 'par.genpar',
     'par.gamma', ...) or maximum-likelihood ('MLlaio2004').
     Goodness-of-fit tests are also available: the Anderson-Darling
     goodness of fit test with 'GOFlaio2004' (Laio. 2004), and
     Monte-Carlo based tests with 'GOFmontecarlo'. Confidence intervals
     for the fitted distribution can be calculated with a Markov Chain
     Monte Carlo algorithm, using 'BayesianMCMC'.

     More details are provided in vignettes:

       'nsRFA_ex01'   How to use the package nsRFA (example 1):
                      Regional frequency analysis of the annual flows in Piemonte
                      and Valle d'Aosta
       'MSClaio2008'  Model selection techniques for the frequency analysis
                      of hydrological extremes: the MSClaio2008 R function

     that can be accessed via 'vignette("nsRFA_ex01",
     package="nsRFA")'.

     *Other functions*

     'varLmoments' provides distribution-free unbiased estimators of
     the variances and covariances of sample L-moments, as described in
     Elamir and Seheult (2004).

     More details are provided in vignettes:

       'Fig1ElamirSeheult'  Figure 1 in Elamir and Seheult (2004)

_D_e_t_a_i_l_s:


       Package:  nsRFA
       Version:  0.6

     The package provides several tools for Regional Frequency Analysis
     of hydrological variables. The first version dates to 2006 and was
     developed in Turin at the Politecnico by Alberto Viglione.

     For a complete list of the functions, use 'library(help="nsRFA").'

     *Main changes in version 0.6*

       0.6-9:  new vignette 'Fig11GriffisStedinger';
       0.6-8:  exponential and Gumbel distributions added in 'GOFmontecarlo';
       0.6-6:  some plotting position/probability plots have been added in 'DISTPLOTS';
       0.6-4:  refinement of function 'BayesianMCMC';
       0.6-2:  new vignette 'nsRFA_ex02';
       0.6-2:  refinement of function 'BayesianMCMC';
       0.6-0:  new vignette 'nsRFA_ex01';
       0.6-0:  new function 'bestlm';
       0.6-0:  the plotting position/probability plots in 'DISTPLOTS' have been reshaped;
       0.6-0:  this list of changes has been added;

_A_u_t_h_o_r(_s):

     Alberto Viglione

     Maintainer: Alberto Viglione <viglione@hydro.tuwien.ac.at>

_R_e_f_e_r_e_n_c_e_s:

     All the manual references are listed here:

     Beirlant, J., Goegebeur, Y., Teugels, J., Segers, J., 2004.
     Statistics of  Extremes: Theory and Applications. John Wiley and
     Sons Inc., 490 pp.

     Burn, D.H., 1990. Evaluation of regional flood frequency analysis
     with a region of influence approach. Water Resources Research
     26(10), 2257-2265.

     Castellarin, A., Burn, D.H., Brath, A., 2001. Assessing the
     effectiveness of hydrological similarity measures for flood
     frequency analysis. Journal of Hydrology 241, 270-285.

     Chowdhury, J.U., Stedinger, J.R., Lu, L.H., Jul. 1991.
     Goodness-of-fit tests for regional generalized extreme value flood
     distributions. Water Resources Research 27(7), 1765-1776.

     D'Agostino, R., Stephens, M., 1986. Goodness-of-fit techniques.
     Vol.68 of Statistics: textBooks and monographs. Department of
     Statistics, Southern Methodist University, Dallas, Texas.

     Dalrymple, T., 1960. Flood frequency analyses. Vol. 1543-A of
     Water Supply Paper. U.S. Geological Survey, Reston, Va.

     Durbin, J., Knott, M., 1971. Components of Cramer-von Mises
     statistics.  London School of Economics and Political Science, pp.
     290-307.

     El Adlouni, S., Bob\'ee, B., Ouarda, T.B.M.J., 2008. On the tails
     of extreme  event distributions in hydrology. Journal of Hydrology
     355(1-4), 16-33.

     Elamir, E. A.H., Seheult, A.H., 2004. Exact variance structure of
     sample l-moments. Journal of Statistical Planning and Inference
     124, 337-359.

     Everitt, B., 1974. Cluster Analysis. Social Science Research
     Council.  Halsted Press, New York.

     Fill, H., Stedinger, J., 1998. Using regional regression within
     index flood procedures and an empirical bayesian estimator.
     Journal of Hydrology 210(1-4), 128-145.

     Greenwood, J., Landwehr, J., Matalas, N., Wallis, J., 1979.
     Probability weighted moments: Definition and relation to
     parameters of several distributions expressible in inverse form.
     Water Resources Research 15, 1049-1054.

     Hosking, J., 1986. The theory of probability weighted moments.
     Tech. Rep. RC12210, IBM Research, Yorktown Heights, NY.

     Hosking, J., 1990. L-moments: analysis and estimation of
     distributions using linear combinations of order statistics. J.
     Royal Statistical Soc. 52, 105-124.

     Hosking, J., Wallis, J., 1993. Some statistics useful in regional
     frequency analysis. Water Resources Research 29(2), 271-281.

     Hosking, J., Wallis, J., 1997. Regional Frequency Analysis: An
     Approach Based on L-Moments. Cambridge University Press.

     Institute of Hydrology, 1999. Flood Estimation Handbook, Institute
     of  Hydrology, Oxford.

     Kendall, M., Stuart, A., 1961-1979. The Advanced Theory of
     Statistics. Charles Griffin & Company Limited, London.

     Kottegoda, N.T., Rosso, R., 1997. Statistics, Probability, and
     Reliability for Civil and Environmental Engineers, international
     Edition. McGraw-Hill Companies.

     Laio, F., 2004. Cramer-von Mises and Anderson-Darling goodness of
     fit tests for  extreme value distributions with unknown
     parameters, Water Resour. Res.,  40, W09308,
     doi:10.1029/2004WR003204.

     Laio, F., Tamea, S., 2007. Verification tools for probabilistic
     forecast of continuous hydrological variables. Hydrology and Earth
     System Sciences 11, 1267-1277.

     Laio, F., Di Baldassarre G., Montanari A., 2008. Model selection
     techniques  for the frequency analysis of hydrological extremes,
     Water Resour. Res.,  Under Revision.

     Regione Piemonte, 2004. Piano di tutela delle acque. Direzione
     Pianificazione  Risorse Idriche.

     Robson, A., Reed, D., 1999. Statistical procedures for flood
     frequency estimation. In: Flood Estimation HandBook. Vol.~3.
     Institute of Hydrology Crowmarsh Gifford, Wallingford,
     Oxfordshire.

     Sankarasubramanian, A., Srinivasan, K., 1999. Investigation and
     comparison of sampling properties of l-moments and conventional
     moments. Journal of Hydrology 218, 13-34.

     Scholz, F., Stephens, M., 1987. K-sample Anderson-Darling tests.
     Journal of  American Statistical Association, 82(399), pp.
     918-924.

     Sivapalan, M., Takeuchi, K., Franks, S.W., Gupta, V.K., Karambiri,
     H., Lakshmi, V., Liang, X., McDonnell, J.J., Mendiondo, E.M.,
     O'Connell, P.E., Oki, T., Pomeroy, J.W, Schertzer, D., Uhlenbrook,
     S., Zehe, E., 2003. IAHS decade on Predictions in  Ungauged Basins
     (PUB), 2003-2012: Shaping an exciting future for the hydrological 
     sciences, Hydrological Sciences - Journal - des Sciences
     Hydrologiques, 48(6),  857-880.

     Smouse, P.E., Long, J.C., Sokal, R.R., 1986. Multiple regression
     and correlation  extensions of the Mantel test of matrix
     correspondence. Systematic Zoology,  35(4), 627-632.

     Stedinger, J., Lu, L., 1995. Appraisal of regional and index flood
     quantile estimators. Stochastic Hydrology and Hydraulics 9(1),
     49-75.

     Viglione, A., Claps, P., Laio, F., 2006. Utilizzo di criteri di
     prossimit\`a  nell'analisi regionale del deflusso annuo, XXX
     Convegno di Idraulica e  Costruzioni Idrauliche - IDRA 2006, Roma,
     10-15 Settembre 2006.

     Viglione, A., 2007. Metodi statistici non-supervised per la stima
     di grandezze  idrologiche in siti non strumentati, PhD thesis at
     the Politecnico of Turin, February 2007.

     Viglione, A., Claps, P., Laio, F., 2007a. Mean annual runoff
     estimation in  North-Western Italy, In: G. La Loggia (Ed.) Water
     resources assessment  and management under water scarcity
     scenarios, CDSU Publ. Milano. 

     Viglione, A., Laio, F., Claps, P., 2007b. A comparison of
     homogeneity tests for regional frequency analysis. Water Resources
     Research 43~(3).

     Vogel, R., Fennessey, N., 1993. L moment diagrams should replace
     product moment diagrams. Water Resources Research 29(6),
     1745-1752.

     Vogel, R., Wilson, I., 1996. Probability distribution of annual
     maximum, mean, and minimum streamflows in the united states.
     Journal of Hydrologic Engineering 1(2), 69-76.

     Ward, J., 1963. Hierarchical grouping to optimize an objective
     function, Journal  of the American Statistical Association, 58,
     pp. 236-244.

