ExtremesData            package:fExtremes            R Documentation

_E_x_p_l_o_r_a_t_i_v_e _D_a_t_a _A_n_a_l_y_s_i_s

_D_e_s_c_r_i_p_t_i_o_n:

     A collection and description of functions for explorative  data
     analysis including data preprocessing of extreme values.  The
     tools include plot functions for emprical distributions,  quantile
     plots, graphs exploring the properties of exceedences  over a
     threshold, plots for mean/sum ratio and for the development  of
     records. The data preprocessing includes tools to  separate data
     beyond a threshold value, to compute blockwise  data like block
     maxima, and to decluster point process data. 

     The plot functions are:

       'emdPlot'          Plot of empirical distribution function,
       'qqPlot'           Normal quantile-quantile plot,
       'qqbayesPlot'      Normal QQ-Plot with 95 percent intervals,
       'qPlot'            Exponential/Pareto quantile plot,
       'mePlot'           Plot of mean excesses over a threshold,
       'mrlPlot'          another variant, mean residual life plot,
       'mxfPlot'          another variant, with confidence intervals,
       'msratioPlot'      Plot of the ratio of maximum and sum,
       'recordsPlot'      Record development compared with iid data,
       'ssrecordsPlot'    another variant, investigates subsamples,
       'xacfPlot'         ACF of exceedences over a threshold,
       'interactivePlot'  a framework for interactive plot displays,
       'gridVector'       creates from two vectors x and y all grid points.

     The functions for data preprocessing are:

       'findThreshold'  Upper threshold for a given number of extremes,
       'blocks'         Create data blocks on vectors and time series,
       'blockMaxima'    Block Maxima from a vector or a time series,
       'deCluster'      Declusters clustered point process data.

_U_s_a_g_e:

     emdPlot(x, doplot = TRUE, plottype = c("", "x", "y", "xy"), labels = TRUE, ...)

     qqPlot(x, doplot = TRUE, labels = TRUE, ...) 
     qqbayesPlot(x, doplot = TRUE, labels = TRUE, ...)
     qPlot(x, xi = 0, trim = NA, threshold = NA, doplot = TRUE, labels = TRUE, ...)

     mePlot(x, doplot = TRUE, labels = TRUE, ...)
     mrlPlot(x, conf = 0.95, umin = NA, umax = NA, nint = 100, doplot = TRUE, 
          plottype = c("autoscale", ""), labels = TRUE, ...)  
     mxfPlot(x, tail = 0.05, doplot = TRUE, labels = TRUE, ...)  
        
     msratioPlot(x, p = 1:4, doplot = TRUE, plottype = c("autoscale", ""), 
         labels = TRUE, ...) 
        
     recordsPlot(x, conf = 0.95, doplot = TRUE, labels = TRUE, ...)
     ssrecordsPlot(x, subsamples = 10, doplot = TRUE, plottype = c("lin", "log"),
         labels = TRUE, ...)

     xacfPlot(x, threshold = 0.95, lag.max = 15, doplot = TRUE, ...)

     interactivePlot(x, choices = paste("Plot", 1:9), 
         plotFUN = paste("plot.", 1:9, sep = ""), which = "all", ...)
     gridVector(x, y)

     findThreshold(x, n = NA)
     blocks(x, block = "month", FUN = max)
     blockMaxima(x, block = "month", details = FALSE, doplot = TRUE, ...)
     deCluster(x, run = NA, doplot = TRUE)

_A_r_g_u_m_e_n_t_s:

   block: [blockMaxima] - 
           the block size. A numeric value is interpreted as the number
            of data values in each successive block. All the data is
          used, so the last block may not contain 'block' observations.
           If the 'data' has a  'times' attribute containing (in an
          object of class '"POSIXct"', or an object that can be
          converted to that class, see 'as.POSIXct') the times/dates of
          each observation, then 'block' may instead take the character
          values '"month"', '"quarter"', '"semester"' or '"year"'. By
          default monthly blocks from daily data are assumed. 

 choices: [interactivePlot] - 
           a vector of character strings for the choice menu. By
          Default '"Plot 1"' ... '"Plot 9"' allowing for 9 plots at
          maximum. 

    conf: [recordsPlot] - 
           a confidence level. By default 0.95, i.e. 95%. 

 details: [blockMaxima] - 
           a logical. Should details be printed? 

  doplot: a logical. Should the results be plotted? By default 'TRUE'. 

     FUN: the function to be applied. Additional arguments are passed
          by the '...' argument. 

  labels: a logical. Whether or not x- and y-axes should be
          automatically  labelled and a default main title should be
          added to the plot. By default 'TRUE'. 

 lag.max: [xacfPlot] - 
           maximum number of lags at which to calculate the
          autocorrelation  functions. The default value is 15. 

    nint: [mrlPlot] - 
           the number of intervals, see 'umin' and 'umax'. The  default
          value is 100. 

       n: [findThreshold] - 
           a numeric value or vector giving number of extremes above 
          the threshold. If 'n' is not specified, 'n' is  set to an
          integer representing 5% of the data from the  whole data set
          'x'. 

       p: [msratioPlot] - 
           the power exponents, a numeric vector. By default a sequence
          from   1 to 4 in unit integer steps. 

 plotFUN: [interactivePlot] - 
           a vector of character strings naming the plot functions. By
          Default '"plot.1"' ... '"plot.9"' allowing for 9 plots at
          maximum. 

plottype: [emdPlot] - 
           which axes should be on a log scale: '"x"' x-axis only; 
          '"y"' y-axis only; '"xy"' both axes; '""'  neither axis. 
           [msratioPlot] - 
           a logical, if set to '"autoscale"', then the scale of the 
          plots are automatically determined, any other string allows
          user specified scale information through the '...' argument. 
           [ssrecordsPlot] - 
           one from two options can be select either '"lin"' or
          '"log"'. The default creates a linear plot. 

     run: [deCluster] - 
           parameter to be used in the runs method; any two consecutive
           threshold exceedances separated by more than this number of 
          observations/days are considered to belong to different
          clusters. 

subsamples: [ssrecordsPlot] - 
           the number of subsamples, by default 10, an integer value. 

    tail: [mxfPlot] - 
           the threshold determined from the relative number of data
          points  defining the tail, a numeric value; by default 0.05
          which says  that 5% of the data make the tail. 

threshold, trim: [qPlot][xacfPlot] - 
           a numeric value at which data are to be left-truncated,
          value  at which data are to be right-truncated or the
          thresold value,  by default 95%. 

umin, umax: [mrlPlot] - 
           range of threshold values. If 'umin' and/or 'umax' are  not
          available, then by default they are set to the following 
          values: 'umin=mean(x)' and 'umax=max(x)'. 

   which: plot selection, which graph should be displayed? If '"which"'
          is a character string named "ask" the user is interactively
          asked  which to plot, if a logical vector of length 'N',
          those plots  which are set 'TRUE' are displayed, if a
          character string named '"all"' all plots are displayed. 

    x, y: numeric data vectors or in the case of x an object to be
          plotted.  
           [finThreshold][blocks][blockMaxima][deCluster] - 
           a numeric data vector from which 'findThreshold' and 
          'blockMaxima' determine the threshold values and block 
          maxima values.  For the function 'deCluster' the argument 'x'
          represents a numeric vector of threshold exceedances with a
          'times' attribute which should be a numeric vector containing
          either the indices or the times/dates of each exceedance (if
          times/dates, the attribute should be an object of class
          '"POSIXct"' or an object that can be converted to that class;
          see 'as.POSIXct'). [gridVector] - 
           two numeric vector which span the two dimensional grid.  

      xi: the shape parameter of the generalized Pareto distribution. 

     ...: additional arguments passed to the FUN or plot function. 

_D_e_t_a_i_l_s:

     *Empirical Distribution Function:* 

      The function 'emdPlot' is a simple explanatory function. A 
     straight line on the double log scale indicates Pareto tail
     behaviour. 

     *Quantile-Quantile Plot:* 

            The function 'qqPlot' produces a normal QQ-plot. Note, that
      'qqPlot' is not a synonym function call to the R-base function 
     'qqplot' which produces a quantile-quantile plot of two datasets. 
          To help with assessing the relevance of sampling variability
     on just "how close" to the normal the data appears, 'qqbayesPlot'
     adds  approximate posterior 95 function at each point. 'qPlot'
     creates a QQ-plot for threshold data. If 'xi' is  zero the
     reference distribution is the exponential; if 'xi' is  non-zero
     the reference distribution is the generalized Pareto with  that
     value of 'xi'. In the case of the exponential, the plot is 
     interpreted as follows: Concave departures from a straight line
     are a  sign of heavy-tailed behaviour, convex departures show
     thin-tailed  behaviour.  

     *Mean Excess Function Plot:* 

      Three variants to plot the mean excess function are available:  A
     sample mean excess plot over increasing thresholds, and two mean 
     excess function plots with confidence intervals for discrimination
      in the tails of a distribution. In general, an upward trend in a
     mean excess function plot shows  heavy-tailed behaviour. In
     particular, a straight line with positive  gradient above some
     threshold is a sign of Pareto behaviour in tail.  A downward trend
     shows thin-tailed behaviour whereas a line with  zero gradient
     shows an exponential tail. Here are some hints: Because upper
     plotting points are the average of a handful of extreme  excesses,
     these may be omitted for a prettier plot.  For 'mrlPlot' and
     'mxfPlot' the upper tail is investigated;  for the lower tail
     reverse the sign of the 'data' vector. 

     *Plot of the Maximum/Sum Ratio:* 

      The ratio of maximum and sum is a simple tool for detecting heavy
      tails of a distribution and for giving a rough estimate of the
     order of its finite moments. Sharp increases in the curves of a
     'msratioPlot' are a sign for heavy tail behaviour. 

     *Plot of the Development of Records:* 

      These are functions that investigate the development of records
     in  a dataset and calculate the expected behaviour for iid data.
     'recordPlot' counts records and reports the observations  at which
     they occur. In addition subsamples can be investigated with the
     help of the function 'ssrecords'. 

     *ACF Plot of Exceedences over a Thresold:* 

      This function plots the autocorrelation functions of heights and 
     distances of exceedences over a threshold. 

     *Finding Thresholds:*  

      The function 'findThreshold' finds a threshold so that a given 
     number of extremes lie above. When the data are tied a threshold
     is  found so that at least the specified number of extremes lie
     above. 

     *Computing Block Maxima:*  

        The function 'blockMaxima' calculates block maxima from a
     vector  or a time series, whereas the function 'blocks' is more
     general and allows for the calculation of an arbitrary function
     'FUN' on blocks. 

     *De-Clustering Point Processes:*  

      The function 'deCluster' declusters clustered point process  data
     so that Poisson assumption is more tenable over a high threshold.

_V_a_l_u_e:

     'findThreshold'  
      returns a numeric vector of suitable thresholds. 

     'blockMaxima'  
      returns a numeric vector of block maxima data.

     'deCluster'  
      returns an object for the declustered point process.

_N_o_t_e:

     The plots are labeled by default with a x-label, a y-label and a
     main title. If the argument 'label' is set to 'FALSE' neither a
     x-label, a y-label nor a main title will be added to  graph. To
     add user defined label strings "..." just use the  function
     'title(xlab="...", ylab="...", main="...")'.

_A_u_t_h_o_r(_s):

     Some of the functions were implemented from Alec Stephenson's 
     R-package 'evir' ported from Alexander McNeil's S library  'EVIS',
     _Extreme Values in S_, some from Alec Stephenson's  R-package
     'ismev' based on Stuart Coles code from his book,  _Introduction
     to Statistical Modeling of Extreme Values_ and  some were written
     by Diethelm Wuertz.

_R_e_f_e_r_e_n_c_e_s:

     Coles S. (2001); _Introduction to Statistical Modelling of Extreme
     Values_, Springer.

     Embrechts, P., Klueppelberg, C., Mikosch, T. (1997); _Modelling
     Extremal Events_, Springer.

_E_x_a_m_p_l_e_s:

      
     ## SOURCE("fExtremes.51A-ExtremesData")

     ## emdPlot -
        xmpExtremes("\nStart: Empirical Distribution Function >")
        # Danish fire insurance data show Pareto tail behaviour:
        par(mfrow = c(2, 2))
        data(danish)
        emdPlot(danish, plottype = "xy", labels = FALSE)
        title(xlab = "x", ylab = "1-F(x)", main = "Danish Fire")   
        # BMW Stocks:
        data(bmw)
        emdPlot(bmw, plottype = "xy", labels = FALSE)
        title(xlab = "x", ylab = "1-F(x)", main = "BMW Stocks")  
        # Simulated Student-t:
        emdPlot(rt(5000, 4), plottype = "xy") 
      
     ## qqPlot -
        xmpExtremes("\nNext: Quantile-Quantile Plot >")
        # QQ-Plot of Simulated Normal rvs:
        par(mfrow = c(2, 2))
        set.seed(4711)
        qqPlot(rnorm(5000))
        text(-3.5, 3, pos = 4, "Simulated Normal rvs")
        # QQ-Plot of simulated Student-t rvs:
        qqPlot(rt(5000, 4))
        text(-3.5, 11.0, pos = 4, "Simulated Student-t rvs")
        # QQ-Plot of BMW share residuals:
        data(bmw)
        qqPlot(bmw)
        text(-3.5, 0.09, pos = 4, "BMW log returns")     
        
     ## qPlot -
        xmpExtremes("\nNext: QQ-Plot of Heavy Tails >")
        # QQ-Plot of heavy-tailed Danish fire insurance data:
        data(danish)
        qPlot(danish) 
      
     ## mePlot -
        xmpExtremes("\nNext: Mean Excess Plot >")
        # Sample mean excess plot of heavy-tailed Danish fire 
        # insurance data 
        par(mfrow = c(3, 2))
        data(danish)
        mePlot(danish, labels = FALSE)
        title(xlab = "u", ylab = "e", main = "mePlot - Danish Fire Data")
        
     ## mrlPlot -
        xmpExtremes("\nNext: mean Residual Live Plot >")
        # Sample mean residual live plot of heavy-tailed Danish Fire 
        # insurance data 
        mrlPlot(danish, labels = FALSE)
        title(xlab = "u", ylab = "e", main = "mrlPlot - Danish Fire Data")
      
     ## mxfPlot -
        xmpExtremes("\nNext: Mean Excess Function Plot >")
        # Plot the mean excess functions for randomly distributed 
        # residuals  
        par(mfrow = c(2, 2))
        n = 10000    
        set.seed(4711)
        xlab = "Threshold: u"; ylab = "Mean Excess: e"
        mxfPlot(rnorm(n), tail = 0.5, labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "mxf Plot - Normal DF")
        set.seed(7138)
        mxfPlot(rexp(n, 2), tail = 0.5, labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "mxfPlot - Exponential DF")
        abline(1/2, 0)
        set.seed(6952)
        mxfPlot(rlnorm(n, 0, 2), tail = 0.5, xlim = c(0,90), 
          ylim = c(0, 120), labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "mxfPlot - Lognormal DF")
        set.seed(8835)
        mxfPlot(rgpd(n, 1/2), tail = 0.10, xlim = c(0,200), 
          ylim=c(0,200), labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "mxfPlot - Pareto")
        abline(0, 1)  
      
     ## msratioPlot -
        xmpExtremes("\nNext: Maximum/Sum Ratio Plot >")
        # Examples for Ratio of Maximum and Sum Plots:
        par(mfrow = c(3, 2))
        data(bmw)
        xlab = "n"; ylab = "R(n)"
        msratioPlot (rnorm(8000), labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "Standard Normal")
        msratioPlot (rexp(8000), labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "Exponential")
        msratioPlot (rt(8000, 4), labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "Student-t")
        msratioPlot (rcauchy(8000), labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "Cauchy")
        msratioPlot (bmw, labels = FALSE)
        title(xlab = xlab, ylab = ylab, main = "BMW Returns")
       
     ## recordsPlot -
        xmpExtremes("\nNext: Records Plot >")
        # Record fire insurance losses in Denmark
        par(mfrow = c(2, 2))
        data(danish)
        recordsPlot(danish)
        text(1, 7.9, pos = 4, "Danish Fire")
        # BMW Stocks
        data(bmw)
        recordsPlot(bmw)
        text(1, 12.8, pos = 4, "BMW Shares")
           
     ## ssrecordsPlot -
        xmpExtremes("\nNext: Subsample Record Plot >")
        # Record fire insurance losses in Denmark
        ssrecordsPlot(danish)
        text(1, 9.2, pos = 4, "Danish Fire")
        # BMW Stocks
        ssrecordsPlot(bmw)
        text(1, 10.5, pos = 4, "BMW Shares")  
      
     ## xacfPlot -
        xmpExtremes("\nNext: ACF Plot of Exceedences >")
        # Plot ACF of Heights/Distances of Eceedences over threshold:
        par(mfrow = c(2, 2))
        data(bmw)
        xacfPlot(bmw)
        
     ## findThreshold -
        xmpExtremes("\nStart: Find Thresold >")
        # Find threshold giving (at least) fifty exceedances 
        # for Danish Fire data
        data(danish)
        findThreshold(danish, n = c(10, 50, 100))    
        
     ## blockMaxima -
        xmpExtremes("\nNext: Compute Block Maxima >")
        # Block Maxima (Minima) for the right and left tails 
        # of the BMW log returns:
        data(bmw)
        par(mfrow = c(2, 1))
        blockMaxima( bmw, block = 100)
        blockMaxima(-bmw, block = 100)     
      
     ## deCluster -
        xmpExtremes("\nNext: De-Cluster Exceedences >")
        # Decluster the 200 exceedances of a particular  
        # threshold in the negative BMW log-return data
        par(mfrow = c(2, 2))
        fit = potFit(-bmw, nextremes = 200) 
        deCluster(fit$fit$data, 30)   

