pr_DB                 package:proxy                 R Documentation

_R_e_g_i_s_t_r_y _o_f _p_r_o_x_i_m_i_t_i_e_s

_D_e_s_c_r_i_p_t_i_o_n:

     Registry containing similarities and distances.

_U_s_a_g_e:

     pr_DB
     pr_DB$get_field(name)
     pr_DB$get_fields()
     pr_DB$get_field_names()
     pr_DB$set_field(name, default = NA, type = NA, is_mandatory = FALSE,
                     is_modifiable = TRUE, validity_FUN = NULL)

     pr_DB$entry_exists(name)
     pr_DB$get_entry(name)
     pr_DB$get_entries(name = NULL, pattern = NULL)
     pr_DB$get_entry_names(name)
     pr_DB$set_entry(...)
     pr_DB$modify_entry(...)
     pr_DB$delete_entry(name)

     ## S3 method for class 'pr_DB':
     summary(object, verbosity = c("short", "long"), ...)

_A_r_g_u_m_e_n_t_s:

    name: character string representing the name of an entry
          (case-insensitive).

 pattern: regular expression to be matched to all fields of class
          '"character"' in all entries.

 default: optional default value for the field.

    type: optional character string specifying the class to be required
          for this field. If 'type' is a character vector with more
          than two elements, the entries will be used as fixed set of
          alternatives. If 'type' is not a character string or vector,
          the class will be inferred from the argument given.

is_mandatory: logical specifying whether new entries are required to
          have a value for this field.

is_modifiable: logical specifying whether entries can be changed with
          respect to that field.

validity_FUN: optional function or character string with the name of a
          function that checks the validity of a field entry. Such a
          function gets the value to be investigated as argument, and
          should stop with an error message if the value is not
          correct.

  object: a registry object.

verbosity: controlling the verbosity of the output of the summary
          method for the registry. '"short"' gives just a list,
          '"long"' also gives the formulas.

     ...: for 'pr_DB$set_entry' and 'pr_DB$modify_entry': named list of
          fields to be modified in or added to the registry (see
          details). This must include the index field ('"names"').

_D_e_t_a_i_l_s:

     'pr_DB' represents the registry of all proximity measures
     available. For each measure, it comprises meta-information that
     can be queried and extended. Also, new measures can be added. This
     is done using the following accessor functions of the 'pr_DB'
     object:

     'get_field_names()' returns a character vector with all field
     names. 'get_field()' returns the information for a specific field
     as a list with components named as described above. 'get_fields()'
     returns a list with all field entries. 'set_field()' is used to
     create new fields in the repository (the default value will be set
     in all entries).

     'get_entry_names()' returns a character vector with (the first
     alias of) all entries. 'entry_exists()' is a predicate checking if
     an entry with the specified alias exists in the registry.
     'get_entry()' returns the specified entry if it exists (and, by
     default, gives an error if it does not). 'get_entries()' is used
     to query more than one entry: either those matching 'name'
     exactly, or those where the regular expression in 'pattern'
     matches _any_ character field in an entry. By default, all values
     are returned. 'delete_entry' removes an existing entry from the
     registry (note that only user-provided entries can be deleted).
     'set_entry' and 'modify_entry' require a named list of arguments
     used as field entries. At least the 'names' index field is
     required. 'set_entry' will check for all other mandatory fields.
     If specified in the field meta data, each field entry and the
     entry as a whole is checked for validity. Note that only
     user-specified fields and/or entries can be modified, the data
     shipped with the package are read-only.

     The registry fields currently available are as follows:

     _F_U_N Function to register (see below).

     _n_a_m_e_s Character vector with an alias(es) for the measure.

     _P_R_E_F_U_N Optional function (or function name) for preprocessing code
          (see below).

     _P_O_S_T_F_U_N Optional function (or function name) for postprocessing
          code (see below).

     _d_i_s_t_a_n_c_e logical indicating whether this measure is a distance
          ('TRUE') or similarity ('FALSE').

     _c_o_n_v_e_r_t Optional Function or function name for converting between
          similarities and distances when needed.

     _t_y_p_e Optional, the scale the measure applies to ('"metric"',
          '"ordinal"', '"nominal"', '"binary"', or '"other"'). If
          'NULL', it is assumed to apply to some other unknown scale.

     _l_o_o_p logical indicating whether 'FUN' is just a measure, and
          therefore, if 'dist' shall do the loop over all pairs of
          observations/variables, or if 'FUN' does the loop on its own.

     '_C__F_U_N' logical indicating whether 'FUN' is a C function.

     _a_b_c_d logical; if 'TRUE' and binary data (or data to be interpreted
          as such) are supplied, the number of concordant and
          discordant pairs is precomputed for every two binary data
          vectors and supplied to the measure function.

     _f_o_r_m_u_l_a Optional character string with the symbolic representation
          of the formula.

     _r_e_f_e_r_e_n_c_e Optional reference (character).

     _d_e_s_c_r_i_p_t_i_o_n Optional description (character). Ideally, describes
          the context in which the measure can be applied.  

     A function specified as 'FUN' parameter has mandatory arguments
     'x' and 'y' (if 'abcd' is 'FALSE'), and 'a', 'b', 'c', 'd', 'n'
     otherwise. Additionally, it gets all optional parameters specified
     by the user in the '...' argument of the 'dist' and 'simil'
     functions, possibly changed and/or complemented by the
     corresponding (optional) 'PREFUN' function. It must return the
     (diss-)similarity value computed from the arguments. 'x' and 'y'
     are two vectors from the data matrix (matrices) supplied. If
     'abcd' is 'FALSE', it is assumed that binary measures will be
     used, and the number of all 'n' concordant and discordant pairs
     (x_k, y_k) precomputed and supplied instead of 'x' and 'y'. 'a',
     'b', 'c', and 'd' are the counts of all (TRUE, TRUE), (TRUE,
     FALSE), (FALSE, TRUE), and (FALSE, FALSE) pairs, respectively.

     A function specified as 'PREFUN' parameter has mandatory arguments
     'x', 'y', 'p', and 'reg_entry', with 'y' and 'p' possibly being
     'NULL' depending on the task at hand. 'x' and 'y' are the data
     objects, 'p' is a (possibly empty) list with all specified
     proximity parameters, and 'reg_entry' is the registry entry (a
     named list containing all information specified in 'reg_add'). The
     preprocessing function is allowed to change all these information,
     and if so, is required to return *all* arguments as a named list
     in the same order.

     A function specified as 'POSTFUN' parameter has two mandatory
     arguments: 'result' and 'p'. 'result' will contain the computed
     raw data, i.e. a vector of length n * (n - 1) / 2 for
     auto-distances (see 'dist' for details on 'dist' objects), or a
     matrix for cross-distances. 'p' contains the specified proximity
     parameters. Post-processing functions need to return the 'result'
     object (even if unmodified).

     A function specified as 'convert' parameter should preserve the
     type of its argument.

_A_u_t_h_o_r(_s):

     David Meyer David.Meyer@R-project.org

_S_e_e _A_l_s_o:

     'dist'

_E_x_a_m_p_l_e_s:

     ## create a new distance measure
     mydist <- function(x,y) x * y

     ## create a new entry in the registry with two aliases
     pr_DB$set_entry(FUN = mydist, names = c("test", "mydist"))

     ## look it up (index is case insensitive):
     pr_DB$get_entry("TEST")

     ## modify the content of the description field in the new entry
     pr_DB$modify_entry(names = "test", description = "foo function")

     ## create a new field
     pr_DB$set_field("New")

     ## look up the test entry again (two ways)
     pr_DB$get_entry("test")
     pr_DB[["test"]]

     ## show total number of entries
     length(pr_DB)

     ## show all entries (short list)
     pr_DB$get_entries(pattern = "foo")

     ## show more details
     summary(pr_DB, "long")

     ## get all entries in a list (and extract first two ones)
     pr_DB$get_entries()[1:2]

     ## get all entries as a data frame (select first 3 fields)
     as.data.frame(pr_DB)[,1:3]

     ## delete test entry
     pr_DB$delete_entry("test")

     ## check if it is really gone
     pr_DB$entry_exists("test")

