dissimilarity             package:arules             R Documentation

_D_i_s_s_i_m_i_l_a_r_i_t_y _C_o_m_p_u_t_a_t_i_o_n

_D_e_s_c_r_i_p_t_i_o_n:

     Provides the generic function 'dissimilarity' and the S4 methods
     to compute and returns distances for binary data in a 'matrix',
     'transactions' or 'associations'.

_U_s_a_g_e:

     dissimilarity(x, y = NULL, method = NULL, args = NULL, ...)
     ## S4 method for signature 'itemMatrix':
     dissimilarity(x, y = NULL, method = NULL, args = NULL,
             which = "transactions")
     ## S4 method for signature 'associations':
     dissimilarity(x, y = NULL, method = NULL, args = NULL,
             which = "transactions")
     ## S4 method for signature 'matrix':
     dissimilarity(x, y = NULL, method = NULL, args = NULL)

_A_r_g_u_m_e_n_t_s:

       x: the set of elements (e.g., 'matrix, itemMatrix, transactions,
           itemsets, rules'). 

       y: 'NULL' or a second set to calculate cross dissimilarities. 

  method: the distance measure to be used. Implemented measures are
          (defaults to '"jaccard"'):

          '"_j_a_c_c_a_r_d"': the number of items which occur in both 
               elements divided by the total number of items in the 
               elements (Sneath, 1957).  This measure is often  also
               called: _binary, asymmetric binary,_ etc. 

          '"_m_a_t_c_h_i_n_g"': the _Matching coefficient_ defined by Sokal and
               Michener (1958). This coefficient gives the same weigth
               to presents and absence of items.

          '"_d_i_c_e"': the _Dice's coefficient_ defined by Dice (1945).
               Similar to _Jaccard_ but gives double the weight to
               agreeing items.

          '"_a_f_f_i_n_i_t_y"': measure based on  the 'affinity', a similarity
               measure between items. It is defined as the average
               _affinity_ between the items in two transactions  (see
               Aggarwal et al. (2002)). .in -5

    args: a list of additional arguments for the methods.  

          For calculating '"affinity"' for associations, the affinities
          between the items in  the transactions are needed and passed
          to the method as the first  element in 'args'.

   which: a character string indicating if the dissimilarity should be 
          calculated between transavtions (default) or items (use
          '"items"'). 

     ...: further arguments.

_V_a_l_u_e:

     returns an object of class 'dist'.

_R_e_f_e_r_e_n_c_e_s:

     Sneath, P. H. A. (1957) Some thoughts on bacterial classification.
      _Journal of General Microbiology_ 17, pages 184-200.

     Sokal, R. R. and Michener, C. D. (1958) A statistical method for
     evaluating  systematic relationships. _University of Kansas
     Science Bulletin_ 38,  pages 1409-1438.

     Dice, L. R. (1945) Measures of the amount of ecologic association 
     between species. _Ecology_ 26, pages 297-302.

     Charu C. Aggarwal, Cecilia Procopiuc, and Philip S. Yu. (2002)
     Finding localized associations in market basket data. _IEEE Trans.
     on Knowledge and Data Engineering_ 14(1):51-62.

_S_e_e _A_l_s_o:

     'affinity', 'dist-class', 'itemMatrix-class',
     'associations-class'.

_E_x_a_m_p_l_e_s:

     ## cluster items in Groceries with support > 5%
     data("Groceries")

     s <- Groceries[,itemFrequency(Groceries)>0.05]
     d_jaccard <- dissimilarity(s, which = "items")
     plot(hclust(d_jaccard, method = "ward"))


     ## cluster transactions for a sample of Adult
     data("Adult")
     s <- sample(Adult, 200) 

     ##  calculate Jaccard distances and do hclust
     d_jaccard <- dissimilarity(s)
     plot(hclust(d_jaccard))

     ## calculate affinity-based distances and do hclust
     d_affinity <- dissimilarity(s, method = "affinity")
     plot(hclust(d_affinity))

     ## cluster rules
     rules <- apriori(Adult)
     rules <- subset(rules, subset = lift > 2)

     ## we need to supply the item affinities from the dataset (sample)
     d_affinity <- dissimilarity(rules, method = "affinity", 
       args = list(affinity = affinity(s)))
     plot(hclust(d_affinity))

