traceWminim              package:nsRFA              R Documentation

_C_l_u_s_t_e_r _a_n_a_l_y_s_i_s: _d_i_s_j_o_i_n_t _r_e_g_i_o_n_s

_D_e_s_c_r_i_p_t_i_o_n:

     Formation of disjoint regions for Regional Frequency Analysis.

_U_s_a_g_e:

      traceWminim (X, centers)
      sumtraceW (clusters, X)
      nearest (clusters, X)

_A_r_g_u_m_e_n_t_s:

       X: a numeric matrix of characteristics, or an object that can be
          coerced to such a matrix (such as a numeric vector or a data
          frame with all numeric columns)

 centers: the number of clusters

clusters: a numeric vector containing the subdivision of 'X' in
          clusters

_D_e_t_a_i_l_s:

     The Euclidean distance is used. Given p different classification
     variables, the distance between two elements i and j is:

          d_ij = sqrt{1/p sum[h from 1 to p](x_hi - x_hj)^2}

     where x_hi is the value of the h-th variable of the i-th element.

     The function 'traceWminim' is a composition of a jerarchical
     algorithm, the Ward (1963) one, and an optimisation procedure
     consisting in the minimisation of:

        W = sum[i from 1 to k](sum[j from 1 to ni]delta_ij^2)

     where k is the number of clusters (obtained initially with Ward's
     algorithm), ni is the number of sites in the i-th cluster and
     delta_ij is the Euclidean distance between the j-th element of the
     i-th group and the center of mass of the i-th cluster. W is
     calculated with 'sumtraceW'. The algorithm consist in moving a
     site from one cluster to another if this makes W decrease.

_V_a_l_u_e:

     'traceWminim' gives a vector defining the subdivision of elements
     characterized by 'X' in n='centers' clusters.

     'sumtraceW' gives W (it is used by 'traceWminim').

     'nearest' gives the nearest site to the centers of mass of
     clusters (it is used by 'traceWminim').

_N_o_t_e:

     For information on the package and the Author, and for all the
     references, see 'nsRFA'.

_S_e_e _A_l_s_o:

     'roi', 'AD.dist'.

_E_x_a_m_p_l_e_s:

     data(hydroSIMN)
     parameters
     summary(parameters)

     # traceWminim
     param <- parameters[c("Hm","Ybar")]
     n <- dim(param)[1]; k <- dim(param)[2]
     param.norm <- (param - matrix(mean(param),nrow=n,ncol=k,
                    byrow=TRUE))/matrix(sd(param),
                    nrow=n,ncol=k,byrow=TRUE)
     clusters <- traceWminim(param.norm,4); 
     names(clusters) <- parameters["cod"][,]
     clusters

     annualflows
     summary(annualflows)
     x <- annualflows["dato"][,]
     cod <- annualflows["cod"][,]

     fac <- factor(annualflows["cod"][,],
                   levels=names(clusters[clusters==1]))
     x1 <- annualflows[!is.na(fac),"dato"]
     cod1 <- annualflows[!is.na(fac),"cod"]
     #HW.tests(x1,cod1)          # it takes some time

     fac <- factor(annualflows["cod"][,],
                   levels=names(clusters[clusters==3]))
     x3 <- annualflows[!is.na(fac),"dato"]
     cod3 <- annualflows[!is.na(fac),"cod"]
     #HW.tests(x3,cod3)          # it takes some time

