DataPreprocessing         package:fExtremes         R Documentation

_E_x_t_r_e_m_e_s _D_a_t_a _P_r_e_p_r_o_c_e_s_s_i_n_g

_D_e_s_c_r_i_p_t_i_o_n:

     A collection and description of functions for data  preprocessing
     of extreme values. This includes tools  to separate data beyond a
     threshold value, to compute  blockwise data like block maxima, and
     to decluster  point process data. 

     The functions are:

       'blockMaxima'    Block Maxima from a vector or a time series,
       'findThreshold'  Upper threshold for a given number of extremes,
       'pointProcess'   Peaks over Threshold from a vector or a time series,
       'deCluster'      Declusters clustered point process data.

_U_s_a_g_e:

     blockMaxima(x, block = c("monthly", "quarterly"), doplot = FALSE)
     findThreshold(x, n = floor(0.05*length(as.vector(x))), doplot = FALSE)
     pointProcess(x, u = quantile(x, 0.95), doplot = FALSE)
     deCluster(x, run = 20, doplot = TRUE)

_A_r_g_u_m_e_n_t_s:

   block: the block size. A numeric value is interpreted as the number 
           of data values in each successive block. All the data is
          used, so the last block may not contain 'block' observations.
           If the 'data' has a  'times' attribute containing (in an
          object of class '"POSIXct"', or an object that can be
          converted to that class, see 'as.POSIXct') the times/dates of
          each observation, then 'block' may instead take the character
          values '"month"', '"quarter"', '"semester"' or '"year"'. By
          default monthly blocks from daily data are assumed. 

  doplot: a logical value. Should the results be plotted? By  default
          'TRUE'. 

       n: a numeric value or vector giving number of extremes above 
          the threshold. By default, 'n' is  set to an integer
          representing 5% of the data from the  whole data set 'x'. 

     run: parameter to be used in the runs method; any two consecutive 
          threshold exceedances separated by more than this number of 
          observations/days are considered to belong to different
          clusters. 

       u: a numeric value at which level the data are to be truncated.
          By  default the threshold value which belongs to the 95%
          quantile, 'u=quantile(x,0.95)'.        

       x: a numeric data vector from which 'findThreshold' and 
          'blockMaxima' determine the threshold values and block 
          maxima values.  For the function 'deCluster' the argument 'x'
          represents a numeric vector of threshold exceedances with a
          'times' attribute which should be a numeric vector containing
          either the indices or the times/dates of each exceedance (if
          times/dates, the attribute should be an object of class
          '"POSIXct"' or an object that can be converted to that class;
          see 'as.POSIXct'). 

_D_e_t_a_i_l_s:

     *Computing Block Maxima:*  

        The function 'blockMaxima' calculates block maxima from a
     vector  or a time series, whereas the function 'blocks' is more
     general and allows for the calculation of an arbitrary function
     'FUN' on blocks. 

     *Finding Thresholds:*  

      The function 'findThreshold' finds a threshold so that a given 
     number of extremes lie above. When the data are tied a threshold
     is  found so that at least the specified number of extremes lie
     above. 

     *De-Clustering Point Processes:*  

      The function 'deCluster' declusters clustered point process  data
     so that Poisson assumption is more tenable over a high threshold.

_V_a_l_u_e:

     'blockMaxima'  
      returns a timeSeries object or a numeric vector of block  maxima
     data.

     'findThreshold'  
      returns a numeric value or vector of suitable thresholds. 

     'pointProcess'  
      returns a timeSeries object or a numeric vector of peaks over a
     threshold.

     'deCluster'  
      returns a timeSeries object or a numeric vector for the 
     declustered point process.

_A_u_t_h_o_r(_s):

     Some of the functions were implemented from Alec Stephenson's 
     R-package 'evir' ported from Alexander McNeil's S library  'EVIS',
     _Extreme Values in S_, some from Alec Stephenson's  R-package
     'ismev' based on Stuart Coles code from his book,  _Introduction
     to Statistical Modeling of Extreme Values_ and  some were written
     by Diethelm Wuertz.

_R_e_f_e_r_e_n_c_e_s:

     Coles S. (2001); _Introduction to Statistical Modelling of Extreme
     Values_, Springer.

     Embrechts, P., Klueppelberg, C., Mikosch, T. (1997); _Modelling
     Extremal Events_, Springer.

_E_x_a_m_p_l_e_s:

      
     ## findThreshold -
        # Threshold giving (at least) fifty exceedances for Danish data:
        x = as.timeSeries(data(danishClaims))
        findThreshold(x, n = c(10, 50, 100))    
        
     ## blockMaxima -
        # Block Maxima (Minima) for left tail of BMW log returns:
        BMW = as.timeSeries(data(bmwRet))
        colnames(BMW) = "BMW.RET"
        head(BMW)
        x = blockMaxima( BMW, block = 65)
        head(x)
        y = blockMaxima(-BMW, block = 65)    
        head(y) 
        y = blockMaxima(-BMW, block = "monthly")    
        head(y)

        
     ## pointProcess -
        # Return Values above threshold in negative BMW log-return data:
        PP = pointProcess(x = -BMW, u = quantile(as.vector(x), 0.75))
        PP
        nrow(PP)
      
     ## deCluster -
        # Decluster the 200 exceedances of a particular  
        DC = deCluster(x = PP, run = 15, doplot = TRUE) 
        DC
        nrow(DC)

