ProbForecastGOP       package:ProbForecastGOP       R Documentation

_P_r_o_b_a_b_i_l_i_s_t_i_c _W_e_a_t_h_e_r _F_o_r_e_c_a_s_t_s _u_s_i_n_g _t_h_e _G_O_P _m_e_t_h_o_d

_D_e_s_c_r_i_p_t_i_o_n:

     This function generates probabilistic forecasts of weather fields 
     using the Geostatistical Output Perturbation method as described
     by Gel,  Raftery and Gneiting (2004).

_U_s_a_g_e:

     ProbForecastGOP(day, obs, forecast, id, coord1, coord2, cut.points=NULL, 
     max.dist=NULL, nbins=300, variog.model="exponential", max.dist.fit=NULL, 
     init.val=NULL, fix.nugget=FALSE, coord1.grid, coord2.grid, forecast.grid, 
     n.sim=99, out="SIM", n.displ=4, qt.displ=c(10,50,90))

_A_r_g_u_m_e_n_t_s:

     day: numeric vector containing the day of observation.

     obs: numeric vector containing the observed weather quantity.

forecast: numeric vector containing the forecasted weather quantity.

      id: vector with the id of the metereological stations.

  coord1: vector containing the longitudes of the metereological
          stations.

  coord2: vector containing the latitudes of the metereological 
          stations.

cut.points: numeric vector containing the cutpoints used for variogram
          binning.

max.dist: a numerical value giving the upper bound for the  distance
          considered in the variogram computation.

   nbins: a numerical value giving the number of bins for variogram
          binning.

variog.model: character string giving the name of the parametric model
          to fit to the empirical variogram. Implemented models are
          'exponential',  'spherical', 'gauss', 'gencauchy', and
          'matern'.

max.dist.fit: number giving the maximum distance considered when
          fitting the variogram.

init.val: numeric vector giving the initial values for the parameters
          in the variogram model. The number of initial values to be
          entered depends on the parametric model specified. 

          If the variog.model specified is 'exponential', 'spherical'
          or  'gauss', then the parameters required are, in order, the
          nugget  effect, the variance and the range. 

          If the variog.model specified is 'gencauchy', the parameters
          required  are, in order, the nugget effect, the variance, the
          range, the smoothness  parameter 'a' and the long-range
          parameter 'b'.

          If the variog.model specified is 'matern' the parameters
          required are,  in order, the nugget effect, the variance, the
          range, and the smoothness  parameter 'a'.

          For more details on the valid range for the parameters and
          for the  equation  of the variogram models listed above, look
          below at the section "Details".

fix.nugget: logical field indicating whether the nugget should be
          considered fixed or not. 

          If 'TRUE' the nugget effect will be assumed to be constant,
          and a value for the fixed nugget effect can be also provided.
          If the value provided is different from the one entered in
          the 'init.val' field, then the value of the nugget effect is
          taken to be the one entered in the 'init.val' field. 

          If 'FALSE' the nugget effect will be estimated along with the
          other parameters in the variogram model.

coord1.grid: numeric vector containing the longitudes of the grid
          points for the forecast field.

coord2.grid: numeric vector containing the latitudes of the grid points
          for the forecast field.

forecast.grid: numeric vector containing the forecast grid.

   n.sim: number of realizations to be simulated.

     out: character string indicating the output produced by the
          function. Possible values are 'VARIOG', 'FIT' and 'SIM'. 

          If 'VARIOG' is specified, the function computes the empirical
           variogram of the forecast errors averaged over time and
          displays a plot of the empirical variogram. 

          If 'FIT' is specified, the function, in addition, fits a
          parametric model to the empirical variogram, returns
          estimates of the parameters and display both the empirical
          and the fitted variogram.

          If 'SIM' is specified, the function simulates also
          realizations of the weather random field using the parametric
          variogram model specified. The function displays the
          simulated weather fields and some specified percentiles.

 n.displ: number of realizations to be displayed on screen.

qt.displ: numeric vector containing the quantiles to be displayed.

_D_e_t_a_i_l_s:

     This function generates probabilistic weather field forecasts by
     generating ensemble members using the Geostatistical Output
     Perturbation method presented by Gel, Raftery and Gneiting (2004).

     The Geostatistical Output Perturbation method consists of the
     following steps:

     1. determination of the residuals, by regressing the forecasts on
     the observed weather quantity and computation of their empirical
     variogram. 

     2. estimation of the parameters of the variogram model fitted to
     the empirical variogram. 

     3. simulation of weather random fields using the estimated
     parametric variogram model.

     - _Computation of the empirical variogram of the residuals_

     The empirical variogram of the residuals is calculated by
     determining, for each day, the distance among all pairs of
     stations that have been observed in the same day and by
     calculating for each day the sum of all the squared differences in
     the residuals within each bin. These sums are then averaged over
     time, with weights for each bin given by the sum over time of the
     number of pairs of stations within the bin. 

     The formula used is:

   gamma(h) = sum_d frac{1}{2N_{(h,d)}} (sum_h (Y(i+h,d)-Y(i,d))^2)


     - _Estimation of the parameters of the parametric variogram model 
     fitted to the empirical variogram._

     The function estimates the parameters in the variogram model by
     minimizing the weighted least-square loss function


 LOSS(theta) = sum_k n_k [ ( frac{hat{gamma_k} - gamma_k(theta)}{gamma_k(theta)} )^2 ]


     where n_k is the number of pairs contributing to the variogram 
     computation in the kth bin, hat{gamma_k} is the value of the 
     empirical variogram in the kth bin and gamma_k(theta) is the value
     of the estimated parametric variogram model at the midpoint of the
     kth bin.

     - Parametric variogram models -

     The parametric variogram models implemented are: exponential,
     spherical,  gaussian, generalized Cauchy and Whittle-Matern.

     - 'exponential'

     The equation of the exponential variogram with parameters the
     nugget  effect, rho, the variance, sigma^{2}, and the range,  r,
     is given by:


         gamma(d) = rho+sigma^{2} cdot (1-exp(- frac{d}{r}))


     where d is the distance between the two locations and  gamma(d) is
     the value of the exponential variogram at distance  d. The nugget
     effect, rho, is a non-negative number, while  the variance,
     sigma^2, and the range, r, are positive  numbers.

     - 'spherical'

     The equation of the spherical variogram with parameters the nugget
     effect,  rho, the variance, sigma^2, and the range, r, is given 
     by:


 gamma(d) = rho+sigma^{2} cdot (frac{3}{2}  cdotfrac{d}{r}-frac{1}{2} cdot frac{d^3}{r^3})


     where d is the distance between the two locations and  gamma(d) is
     the value of the spherical variogram at distance  d. The nugget
     effect, rho, is a non-negative number, while  the variance,
     sigma^2, and the range, r, are positive  numbers.

     - 'gauss'

     The equation of the gaussian variogram with parameters the nugget
     effect,  rho, the variance, sigma^2, and the range, r, is given 
     by:


       gamma(d) = rho+sigma^{2} cdot (1-exp(- frac{d^2}{r^2} ))


     where d is the distance between the two locations and  gamma(d) is
     the value of the gaussian variogram at distance  d. The nugget
     effect, rho, is a non-negative number, while  the variance,
     sigma^2, and the range, r, are positive  numbers.

     - 'gencauchy'

     The equation of the generalized Cauchy variogram with parameters
     the  nugget effect, rho, the variance, sigma^2, the range,  r, 
     the smoothness parameter, a, and the long-range parameter, b, is
     given by:


 gamma(d) = rho+sigma^{2} cdot  (1-(1+frac{d^a}{r^a})^{- frac{b}{a}})


     where d is the distance between the two locations and  gamma(d) is
     the value of the generalized Cauchy variogram at  distance d. The
     nugget effect, rho is a non-negative number,  the variance,
     sigma^2 and the range, r, are positive numbers,  the smoothness
     parameter, a, is a number in (0,2], and the long-range parameter,
     b, is a positive number.

     - 'matern'

     The equation of the Whittle-Matern variogram with parameters the
     nugget  effect, rho, the variance, sigma^2, the range, r, and  the
     smoothness parameter, a, is given by:


 gamma(d) = rho+sigma^{2} cdot (1-(frac{2^{1-a}}{Gamma(a)}cdot  frac{d^a}{r^a} cdot K_{a}(frac{d}{r}))


     where d is the distance between the two locations, gamma(d)  is
     the value of the Whittle-Matern variogram at distance d,  Gamma is
     the gamma function and K_a is the Bessel function of  the third
     kind with parameter a. The nugget effect, rho, is a  non-negative
     number, the variance, sigma^2, the range, r, and the smoothness
     parameter, a, are positive numbers.

     - Defaults-

     If the vector with the cutpoints is not specified, the cutpoints
     are determined so that there are 'nbins' bins with approximately
     the same number of pairs per bin. If both the vector with the
     cutpoints and the number of bins, 'nbins', are unspecified, the
     function by default determines the cutpoints so that there are 300
     bins with approximately the same number of pairs per bin. 

     The default value for the maximum distance considered in the
     variogram computation is the 90-th percentile of the distances
     between all stations. 

     The default value for the maximum distance is frac{1}{2 cdot 
     sqrt{2}} times the maximum distance considered in the empirical 
     variogram. 

     Default for the initial values of the parameters is 'NULL', in
     which  case the initial values for the parameters are determined
     internally and depend on the empirical variogram values. By
     default, the nugget effect is not considered constant. Thus, the
     default value for the 'fix.nugget' field is 'FALSE'.

     The default for 'out' is 'SIM': by default, 99 weather  random
     fields are simulated, and 4 of them are displayed. If no  vector
     of percentiles, is provided, the 10th, 50th and 90th percentiles
     are determined and displayed.

_V_a_l_u_e:

     The function returns both a numerical and a graphical summary. The
     graphical and numerical summary provided depend on the
     specification given in the field 'out'.

     The graphical summary consists of a plot of the empirical
     variogram, if  'out' is equal to 'VARIOG'. 

     It consists of a plot of the empirical variogram with superimposed
     the fitted parametric model, if 'out' is equal to 'FIT'. 

     If 'out' is equal to 'SIM', the graphical summary consists of 
     several plots: a plot of the empirical variogram with superimposed
     the fitted  parametric model, plots of the simulated weather
     fields, and plots of  specified percentiles of the weather random
     field. The plots are printed on multiple pages, and before
     displaying each page, the user will be asked for input.

     The numerical summary include the following: 

bias.coeff: additive and multiplicative bias obtained by linear
          regression of the forecasted weather quantity on the observed
          weather quantity. Always provided.

se.bias.coeff: standard errors of the bias coefficients. Always
          provided.

 res.var: variance of the forecast errors. Always provided.

bin.midpoints: numeric vector with midpoints of the bins used in the
          empirical variogram computation. Always provided.

number.pairs: numeric vector with the number of pairs per bin. Always
          provided.

empir.variog: numeric vector with the empirical variogram values.
          Always provided.

   model: name of the parametric model fitted to the empirical
          variogram. Provided only if 'out' is equal to 'FIT' or 'SIM'.

  nugget: Estimate of the nugget effect. Provided only if 'out' is
          equal to 'FIT' or 'SIM'.

variance: Estimate of the variance. Provided only if 'out' is equal to
          'FIT' or 'SIM'.

   range: Estimate of the range. Provided only if 'out' is  equal to
          'FIT' or 'SIM'.

additional.par: Numeric vector with the estimates of the  additional
          parameters required by the parametric variogram model (only
          if  the model specified is either 'whittlematern' or
          'gencauchy').  This is provided only if 'out' is equal to
          'FIT' or 'SIM'.

sim.fields: 3-dimensional array where each layer contains the values of
          the simulated weather field at the gridded locations.
          Provided only if 'out' is equal to 'SIM'.

pct.fields: 3-dimensional array where each layer contains the specified
          percentiles of the weather field. Provided only if 'out' is
          equal to 'SIM'.

_N_o_t_e:

     Computation of the empirical variogram might require substantial 
     computing time. Hence, it is advisable to run the function once
     with  'out' equal to 'VARIOG', save part of the output and use it
     as input for the 'Variog.fit' function to estimate a parametric
     variogram model. Finally, 'Field.sim' will simulate realizations
     of the weather field from the specified geostatistical model.

_A_u_t_h_o_r(_s):

     Gel, Y., Raftery, A. E., Gneiting, T., Berrocal, V. J.
     veronica@stat.washington.edu.

_R_e_f_e_r_e_n_c_e_s:

     Gel, Y., Raftery, A. E., Gneiting, T. (2004). Calibrated 
     probabilistic mesoscale weather field forecasting: The
     Geostatistical  Output Perturbation (GOP) method (with
     discussion). _Journal of the  American Statistical Association_,
     *Vol. 99 (467)*, 575-583.

     Gel, Y., Raftery, A. E., Gneiting, T., Berrocal, V. J. (2004). 
     Rejoinder. _Journal of the American Statistical Association_, 
     *Vol. 99 (467)*, 588-590.  

     Cressie, N. A. C. (1993). _Statistics for Spatial Data_ (revised
     ed.). Wiley: New York.

     Gneiting, T., Schlather, M. (2004). Stochastic models that
     separate  fractal dimension and the Hurst effect. _SIAM Review_,
     *46*, 269-282.

     Stein, M. L. (1999). _Interpolation of Spatial Data - Some Theory
     for Kriging_. Springer-Verlag: New York.

     Schlather, M. (2001). Simulation and analysis of Random Fields. _R
      News 1(2)_, 18-20.

     Nychka, D. (2004). _The fields package_. Available at:
     http:lib.stat.cmu.edu/R/CRAN/doc/package/fields.pdf.

_S_e_e _A_l_s_o:

     'Emp.variog' for computation of empirical variogram  of forecast
     errors averaged over time, 'Variog.fit' for  estimation of
     parameters in a geostatistical model using Weighted Least  Square,
     'Field.sim' for realizations of weather fields with  the specified
     covariance structure.

_E_x_a_m_p_l_e_s:

     ## Loading data
     library(fields)
     library(RandomFields)
     data(slp)
     day <-slp$date.obs
     id <- slp$id.stat
     coord1 <- slp$lon.stat
     coord2 <- slp$lat.stat
     obs <- slp$obs
     forecast <-slp$forecast

     data(gridlong)
     coord1.grid <- gridlong$gridded.lon

     data(gridlat)
     coord2.grid <- gridlat$gridded.lat

     data(forecast.grid)
     forecast.grid <- forecast.grid$gridded.forecast

     ## Specified cutpoints, default values for all the other fields.
     ## Only empirical variogram computation
     empirical.variog <- ProbForecastGOP(day=day,obs=obs,forecast=forecast,
     id=id,coord1=coord1,coord2=coord2,cut.points=seq(0,1000,by=2),
     max.dist=NULL,nbins=NULL,variog.model="exponential",max.dist.fit=NULL,
     init.val=NULL,fix.nugget=FALSE,coord1.grid=coord1.grid,
     coord2.grid=coord2.grid,forecast.grid=forecast.grid,n.sim=10,
     out="VARIOG",n.displ=4,qt.displ=c(5,50,95))

     ## Unspecified cutpoints.
     ## Fit of a parametric variogram to an empirical variogram.
     fitted.variog <- ProbForecastGOP(day=day,obs=obs,forecast=forecast,id=id,
     coord1=coord1,coord2=coord2,cut.points=NULL,max.dist=NULL,nbins=NULL,
     variog.model="exponential",max.dist.fit=NULL,init.val=NULL,fix.nugget=FALSE,
     coord1.grid=coord1.grid,coord2.grid=coord2.grid,forecast.grid=forecast.grid,
     n.sim=10,out="FIT",n.displ=4,qt.displ=c(5,50,95))

     ## Unspecified cutpoints. 
     ## Whole routine.
     simulation.fields <- 
     ProbForecastGOP(day=day,obs=obs,forecast=forecast,id=id,coord1=coord1,
     coord2=coord2,cut.points=NULL,max.dist=NULL,nbins=NULL,variog.model=NULL,
     max.dist.fit=NULL,init.val=NULL,fix.nugget=FALSE,coord1.grid=coord1.grid,
     coord2.grid=coord2.grid,forecast.grid=forecast.grid,n.sim=4,out="SIM",
     n.displ=4,qt.displ=c(5,50,95))

