digest                package:digest                R Documentation

_C_r_e_a_t_e _h_a_s_h _f_u_n_c_t_i_o_n _d_i_g_e_s_t_s _f_o_r _a_r_b_i_t_r_a_r_y _R _o_b_j_e_c_t_s

_D_e_s_c_r_i_p_t_i_o_n:

     The 'digest' function applies a cryptographical hash function to
     arbitrary R objects. By default, the objects are internally
     serialized, and either one of the currently implemented MD5 and
     SHA-1 hash functions algorithms can be used to compute a compact
     digest of the serialized object.

     In order to compare this implementation with others, serialization
     of the input argument can also be turned off in which the input
     argument must be a character string for which its digest is
     returned.

_U_s_a_g_e:

     digest(object, algo=c("md5", "sha1", "crc32"), 
            serialize=TRUE, file=FALSE, length=Inf)

_A_r_g_u_m_e_n_t_s:

  object: An arbitrary R object which will then be passed to the
          'serialize' function, unless the 'serialize' argument is set
          to 'FALSE'

    algo: The algorithms to be used; currently available choices are
          'md5', which is also the default, 'sha1' and 'crc32'

serialize: A logical variable indicating whether the object should be
          serialized using 'serialize'. Setting this to 'FALSE' allows
          to compare the digest output of given character strings to
          known control output.

    file: A logical variable indicating whether the object is a file
          name.

  length: Number of characters to process. By default, when 'length' is
          set to 'Inf', the whole string or file is processed.

_D_e_t_a_i_l_s:

     Cryptographic hash functions are well researched and documented.
     The MD5 algorithm by Ron Rivest is specified in RFC 1321. The
     SHA-1 algorithm is specified in FIPS-180-1. Crc32 is described in
     <URL: ftp://ftp.rocksoft.com/cliens/rocksoft/papers/crc_v3.txt>.

     For md5 and sha-1, this R implementation relies on two standalone
     implementations in C by Christophe Devine. For crc32, code from
     the zlib library by Jean-loup Gailly and Mark Adler is used.

     Please note that this package is not meant to be used for
     cryptographic purposes for which more comprehensive (and widely
     tested) libraries such as OpenSSL should be used. Also, it is
     known that crc32 is not collision-proof. For sha-1, recent results
     indicate certain cryptographic weaknesses as well. For more
     details, see for example  <URL:
     http://www.schneier.com/blog/archives/2005/02/cryptanalysis_o.html
     >.

_V_a_l_u_e:

     The 'digest' function returns a character string of a fixed length
     containing the requested digest of the supplied R object. For MD5,
     a string of length 32 is returned; for SHA-1, a string of length
     40 is returned; for CRC32 a string of length 8.

_A_u_t_h_o_r(_s):

     Dirk Eddelbuettel edd@debian.org for the R interface; Antoine
     Lucas for the integration of crc32; Jarek Tuszynski for the
     file-based operationss; Christophe Devine for the hash function
     implementations for sha-1 and md5; Jean-loup Gailly and Mark Adler
     for crc32.

_R_e_f_e_r_e_n_c_e_s:

     MD5: <URL: http://www.ietf.org/rfc/rfc1321.txt>. 

     SHA-1: <URL: http://www.itl.nist.gov/fipspubs/fip180-1.htm>.

     CRC32:  <URL:
     ftp://ftp.rocksoft.com/cliens/rocksoft/papers/crc_v3.txt>. 

     <URL: http://www.cr0.net:8040/code/crypto> for the underlying C
     functions used here for sha-1 and md5, and further references.

     <URL: http://zlib.net> for documentation on the zlib library which
     supplied the code for crc32.

_S_e_e _A_l_s_o:

     'serialize', 'md5sum'

_E_x_a_m_p_l_e_s:

     ## Standard RFC 1321 test vectors
     md5Input <-
       c("",
         "a",
         "abc",
         "message digest",
         "abcdefghijklmnopqrstuvwxyz",
         "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789",
         paste("12345678901234567890123456789012345678901234567890123456789012",
               "345678901234567890", sep=""))
     md5Output <-
       c("d41d8cd98f00b204e9800998ecf8427e",
         "0cc175b9c0f1b6a831c399e269772661",
         "900150983cd24fb0d6963f7d28e17f72",
         "f96b697d7cb7938d525a2f31aaf161d0",
         "c3fcd3d76192e4007dfb496cca67e13b",
         "d174ab98d277d9f5a5611c2c9f419d9f",
         "57edf4a22be3c955ac49da2e2107b67a")

     for (i in seq(along=md5Input)) {
       md5 <- digest(md5Input[i], serialize=FALSE)
       stopifnot(identical(md5, md5Output[i]))
     }

     sha1Input <-
       c("abc",
         "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",
         NULL)
     sha1Output <- 
       c("a9993e364706816aba3e25717850c26c9cd0d89d",
         "84983e441c3bd26ebaae4aa1f95129e5e54670f1",
         "34aa973cd4c4daa4f61eeb2bdbad27316534016f")

     for (i in seq(along=sha1Input)) {
       sha1 <- digest(sha1Input[i], algo="sha1", serialize=FALSE)
       stopifnot(identical(sha1, sha1Output[i]))
     }

     crc32Input <-
       c("abc",
         "abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq",
         NULL)
     crc32Output <- 
       c("352441c2",
         "171a3f5f",
         "2ef80172")

     for (i in seq(along=crc32Input)) {
       crc32 <- digest(crc32Input[i], algo="crc32", serialize=FALSE)
       stopifnot(identical(crc32, crc32Output[i]))
     }

     # one of the FIPS-
     sha1 <- digest("abc", algo="sha1", serialize=FALSE)
     stopifnot(identical(sha1, "a9993e364706816aba3e25717850c26c9cd0d89d"))

     # example of a digest of a standard R list structure
     digest(list(LETTERS, data.frame(a=letters[1:5], b=matrix(1:10,ncol=2))))

     # test 'length' parameter and file input
     fname = file.path(R.home(),"COPYING")
     x = readChar(fname, file.info(fname)$size) # read file
     for (alg in c("sha1", "md5", "crc32")) {
       # partial file
       h1 = digest(x    , length=18000, algo=alg, serialize=FALSE)
       h2 = digest(fname, length=18000, algo=alg, serialize=FALSE, file=TRUE)
       h3 = digest( substr(x,1,18000) , algo=alg, serialize=FALSE)
       stopifnot( identical(h1,h2), identical(h1,h3) )
       # whole file
       h1 = digest(x    , algo=alg, serialize=FALSE)
       h2 = digest(fname, algo=alg, serialize=FALSE, file=TRUE)
       stopifnot( identical(h1,h2) )
     }

     # compare md5 algorithm to other tools
     library(tools)
     fname = file.path(R.home(),"COPYING")
     h1 = as.character(md5sum(fname))
     h2 = digest(fname, algo="md5", file=TRUE)
     stopifnot( identical(h1,h2) )

