DCluster              package:DCluster              R Documentation

_A _p_a_c_k_a_g_e _f_o_r _t_h_e _d_e_t_e_c_t_i_o_n _o_f  _s_p_a_t_i_a_l _c_l_u_s_t_e_r_s _o_f _d_i_s_e_a_s_e_s
_f_o_r _c_o_u_n_t _d_a_t_a

_D_e_s_c_r_i_p_t_i_o_n:

     DCluster is a collection of several methods related to the
     detection of spatial clusters of diseases. Many widely used
     methods, such as Openshaw's GAM, Besag and Newell, Kulldorff and
     Nagarwalla, and others have been implemented.

     Besides the calculation of these statistic, bootstrap can be used
     to test its departure from the null hypotheses, which will be no
     clustering in the study area. For possible sampling methods can be
     used to perform the simulations: permutation, Multinomial, Poisson
     and Poisson-Gamma.

     Minor modifications have been made to the methods to use
     standardized expected number of cases instead of population, since
     it provides a better approach to the expected number of cases.

_I_n_t_r_o_d_u_c_t_i_o_n:

     We'll always suppose that we are working on a study region which
     is divided into _n_ non-overlaping smaller areas where data are
     measured. Data measured are usually people suffering from a
     disease or even deaths. This will be refered as _Observed number
     of cases_. For a given area, its observed number of cases will be
     denoted by O_i and the sum of these quantities over the whole
     study region will be O_+.

     In the same way can be defined _Population_ and _Standardized
     Expected number of cases_, which will be denoted by P_i and E_i,
     respectively. The sum of all these quantities  are represented by
     P_+ and E_+.

     The basic assumption for the data is that they are independant
     observations from a Poisson distribution, whose mean is  theta_i
     E_i, where theta_i is the relative risk. That is,


                  O_i ~ Po(theta_i E_i); i=1, ..., n

_N_u_l_l _h_y_p_o_t_h_e_s_e_s:

     Null hypotheses is usually equal relative risks, that is


                 H_0: theta_1= ... = theta_n = lambda


     lambda may be considered to be known (one, which means standard
     risk) or unknown. In the last case, E_i must slightly be corrected
     by multiplying it by the overall relative risk O_+/E_+.

_C_o_d_e _s_t_r_u_c_t_u_r_e:

     Function names follow a common format, which is a follows:

__m_e_t_h_o_d _n_a_m_e_._s_t_a_t Calculate the statistic itself.

__m_e_t_h_o_d _n_a_m_e_._b_o_o_t Perform a non-parametric bootstrap.

__m_e_t_h_o_d _n_a_m_e_._p_b_o_o_t Perform a parametric bootstrap.

     Openshaw's G.A.M. has generally been implemented in a function
     called _gam_, which some methods ( Kulldorff & Nagarwalla, Besag &
     Newell) also use, since they are based on a window scan of the
     whole region. At every point of the grid, a function is called to
     determine whether that point is a cluster or not. The name of this
     function is _shorten method name.iscluster_.

     This function calculates the local value of the statistic involved
     and its signifiance by means of bootstrap. The interface provided,
     through function _gam_, is quite straightforward to use and it can
     handle the  three methods mentioned and other supplied by the
     users.

_B_o_o_t_s_t_r_a_p _p_r_o_c_e_d_u_r_e_s:

     Four possible bootstrap models have been provided in order to
     estimate sampling distributions of the statistics provided. The
     first one is a non-parametric bootstrap, which performs
     permutations over the observed number of cases, while the three
     others are parametric bootstrap based on Multinomial, Poisson and
     Poisson-Gamma distributions.

     Permutation method just takes observed number of cases and permute
     them among all regions, to know whether risk in uniform across the
     whole study area. It just should be used with care since we'll
     face the problem of having more observed cases than population in
     very small populated areas.

     Multinomial sampling is based on conditioning the Poisson
     framework to O_+. THis way (O_1, ..., O_n follows a multinomial
     distribution of size O_+ and  probabilities (E_1/E_+, ...,
     E_n/E_+).

     Poisson sampling just generates observed number of cases from a
     Poisson distribution whose mean is E_i.

     Poisson-Gamma sampling is based on the Poisson-Gamma model
     proposed by _Clayton and Kaldor_ (1984):


                   O_i | theta_i ~ Po(theta_i E_i)



                       theta_i ~ Ga(nu, alpha)


     The distribution of O_i unconditioned to theta_i is Negative
     Binomial with size nu and probability alpha/(alpha+E_i). The two
     parameters can be estimated using an Empirical Bayes approach from
     the Expected and Observed number of cases. Function _empbaysmooth_
     is provided for this purpose.

_D_a_t_a:

     One of the parameters, which is usually called _data_, passed to
     many of the functions in this package is a dataframe which
     contains the data for each of the regions used in the analysis.
     Besides, its columns must be labeled:


   *_O_b_s_e_r_v_e_d* Observed number of cases.

   *_E_x_p_e_c_t_e_d* Standardised expected number of cases.

   *_P_o_p_u_l_a_t_i_o_n* Population at risk.

   *_x* Easting coordinate of the region centroid.

   *_y* Northing coordinate of the region centroid.

_R_e_f_e_r_e_n_c_e_s:

     Clayton, David and Kaldor, John (1987). Empirical Bayes Estimates
     of Age-standardized Relative Risks for Use in Disease Mapping.
     Biometrics 43, 671-681.

     Lawson et al (eds.) (1999). Disease Mapping and Risk Assessment
     for Public Health. John Wiley and Sons, Inc.

     Lawson, A. B. (2001). Statistical Methods in Spatial Epidemiology.
     John Wiley and Sons, Inc.

