sic2004                package:gstat                R Documentation

_S_p_a_t_i_a_l _I_n_t_e_r_p_o_l_a_t_i_o_n _C_o_m_p_a_r_i_s_o_n _2_0_0_4 _d_a_t_a _s_e_t: _N_a_t_u_r_a_l _A_m_b_i_e_n_t _R_a_d_i_o_a_c_t_i_v_i_t_y

_D_e_s_c_r_i_p_t_i_o_n:

     The text below is copied from <URL:
     http://www.ai-geostats.org/events/sic2004/index.htm>, subsection
     Data.

     The variable used in the SIC 2004 exercise is natural ambient
     radioactivity measured in Germany. The data, provided kindly by
     the German Federal Office for Radiation Protection (BfS), are
     gamma dose rates reported by means of the national automatic
     monitoring network (IMIS).

     In the frame of SIC2004,  a rectangular area was used to select
     1008 monitoring stations (from a total of around 2000 stations).
     For these 1008 stations, 11 days of measurements have been
     randomly selected during the last 12 months and the average daily
     dose rates calculated for each day. Hence, we ended up having 11
     data sets.

     Prior information (sic.train): 10 data sets of 200 points that are
     identical for what concerns the locations of the monitoring
     stations have been prepared. These locations have been randomly
     selected (see Figure 1). These data sets differ only by their Z
     values since each set corresponds to 1 day of measurement made
     during the last 14 months. No information will be provided on the
     date of measurement.  These 10 data sets (10 days of measurements)
     can be used as prior information to tune the parameters of the
     mapping algorithms. No other information will be provided about
     these sets. Participants are free of course to gather more
     information about the variable in the literature and so on.

     The 200 monitoring stations above were randomly taken from a
     larger set of 1008 stations. The remaining 808 monitoring stations
     have a topology given in sic.pred.  Participants to SIC2004 will
     have to estimate the values of the variable taken at these 808
     locations.  

     The SIC2004 data (sic.test, variable first): The exercise consists
     in using 200 measurements made on a 11th day (THE data of the
     exercise) to estimate the values observed at the remaining 808
     locations (hence the question marks as symbols in the maps shown
     in Figure 3). These measurements will be provided only during two
     weeks (15th of September until 1st of October 2004) on a web page
     restricted to the participants. The true values observed at these
     808 locations will be released only at the end of the exercise to
     allow participants to write their manuscripts (sic.valid).

     In addition, a joker data set was released (sic.test, variable
     second), which contains an anomaly. The anomaly was generated by a
     simulation model, and does not represent measured levels.

_U_s_a_g_e:

     data(sic2004) # 

_F_o_r_m_a_t:

     The data frames contain the following columns:

     _r_e_c_o_r_d this integer value is the number (unique value) of the
          monitoring station chosen by us.

     _x X-coordinate of the monitoring station indicated in meters

     _y Y-coordinate of the monitoring station indicated in meters

     _d_a_y_0_1 mean gamma dose rate measured during 24 hours, at day01.
          Units are nanoSieverts/hour

     _d_a_y_0_2 same, for day 02

     _d_a_y_0_3 ...

     _d_a_y_0_4 ...

     _d_a_y_0_5 ...

     _d_a_y_0_6 ...

     _d_a_y_0_7 ...

     _d_a_y_0_8 ...

     _d_a_y_0_9 ...

     _d_a_y_1_0 ...

     _f_i_r_s_t the data observed at the 11-th day

     _s_e_c_o_n_d the joker data set, containing an anomaly not present in
          the training data

_N_o_t_e:

     the data set sic.grid provides a set of points on a regular grid
     (almost 10000 points) covering the area; this is convenient for
     interpolation.

     The coordinates have been projected around a point located in the
     South West of Germany.

_R_e_f_e_r_e_n_c_e_s:

     <URL: http:/www.ai-geostats.org/>

     <URL: http://www.ai-geostats.org/events/sic2004/index.htm>

_E_x_a_m_p_l_e_s:

     data(sic2004) 
     # FIGURE 1. Locations of the 200 monitoring stations for the 11 data sets. 
     # The values taken by the variable are known.
     plot(y~x,sic.train,pch=1,col="red", asp=1)

     # FIGURE 2. Locations of the 808 remaining monitoring stations at which 
     # the values of the variable must be estimated.
     plot(y~x,sic.pred,pch="?", asp=1, cex=.8) # Figure 2

     # FIGURE 3. Locations of the 1008 monitoring stations (exhaustive data sets). 
     # Red circles are used to estimate values located at the questions marks
     plot(y~x,sic.train,pch=1,col="red", asp=1)
     points(y~x, sic.pred, pch="?", cex=.8)

