refdata                 package:ref                 R Documentation

_s_u_b_s_e_t_t_a_b_l_e _r_e_f_e_r_e_n_c_e _t_o _m_a_t_r_i_x _o_r _d_a_t_a._f_r_a_m_e

_D_e_s_c_r_i_p_t_i_o_n:

     Function 'refdata' creates objects of class refdata which behave
     not totally unlike matrices or data.frames but allow for much more
     memory efficient handling.

_U_s_a_g_e:

     # -- usage for R CMD CHECK, see below for human readable version -----------
     refdata(x)
     [.refdata(x, i = NULL, j = NULL, drop = FALSE, ref = FALSE)
     [<-.refdata(x, i = NULL, j = NULL, ref = FALSE, value)
      ## S3 method for class 'refdata':
      dim(x, ref = FALSE)
      ## S3 method for class 'refdata':
      dim(x) <- value
      ## S3 method for class 'refdata':
      dimnames(x, ref = FALSE)
      ## S3 method for class 'refdata':
      dimnames(x, ref = FALSE) <- value

     # -- most important usage for human beings (does not pass R CMD CHECK) -----
     # rd <- refdata(x)                   # create reference
     # rd[]                               # get all data
     # rd[i, j]                           # get part of data
     # rd[i, j, ref=TRUE]                 # get new reference on part of data
     # rd[i, j] <- value                  # modify part of data (now rd is reference on local copy of the data)
     # rd[i, j, ref=TRUE] <- value        # modify part of original data (respecting subsetting history)
     # dim(rd)                            # dim of (subsetted) data
     # dim(rd, ref=TRUE)                  # dim of original data
     # dimnames(rd)                       # dimnames of (subsetted) data
     # dimnames(rd, ref=TRUE)             # dimnames of original data
     # dimnames(rd) <- value              # modify dimnames (now rd is reference on local copy of the data)
     # dimnames(rd, ref=TRUE) <- value    # modify complete dimnames of original object (NOT respecting subsetting history)

_A_r_g_u_m_e_n_t_s:

       x: a matrix or data.frame or any other 2-dimensional object that
          has operators "[" and "[<-" defined 

       i: row index 

       j: col index 

     ref: FALSE by default. In subsetting: FALSE returns data, TRUE
          returns new refdata object. In assignments: FALSE modifies a
          local copy and returns a refdata object embedding it, TRUE
          modifies the original. 

    drop: FALSE by default, i.e. returned data have always a dimension
          attribute. TRUE drops dimension in some cases, the exact
          result depends on whether a 'matrix' or 'data.frame' is
          embedded 

   value: some value to be assigned 

_D_e_t_a_i_l_s:

     Refdata objects store 2D-data in one environment and index
     information in another environment. Derived refdata objects
     usually share the data environment but not the index environment. 
      The index information is stored in a standardized and memory
     efficient form generated by 'optimal.index'. 
      Thus refdata objects can be copied and subsetted and even
     modified without duplicating the data in memory. 
      Empty square bracket subsetting ('rd[]') returns the data, square
     bracket subsetting ('rd[i, j]') returns subsets of the data as
     expected. 
      An additional argument ('rd[i, j, ref=TRUE]') allows to get a
     reference that stores the subsetting indices. Such a reference
     behaves transparently as if a smaller matrix/data.frame would be
     stored and can be subsetted again recursively. With ref=TRUE
     indices are always interpreted as row/col indices, i.e. 'x[i]' and
     'x[cbind(i, j)]' are undefined (and raise stop errors) 
      Standard square bracket assignment ('rd[i, j] <- value') creates
     a reference to a locally modified copy of the (potentially
     subsetted) data. 
      An additional argument ('rd[i, j, ref=TRUE] <- value') allows to
     modify the original data, properly recognizing the subsetting
     history. 
      A method 'dim(refdata)' returns the dim of the (indexed) data,
     the dim of the original (non-indexed) data can be accessed using
     parameter 'ref=TRUE'. Assignment to dim(refdata)<- is not
     possible.  but 'dim(refdata)<-' cannot be assigned. 
      A 'dimnames(refdata)' returns the dimnames of the (indexed) data
     resp. the original data using parameter 'ref=TRUE'. Assignment is
     possible but not recommended, parameter 'ref' decides whether the
     original data is modified or a copy is created. 

_V_a_l_u_e:

     an object of class refdata (appended to class attributes of data),
     which is an empty list with two attributes 

     dat: the environment where the data x and its dimension dim is
          stored

     ind: the environment where the indexes i, j and the effective
          subset size ni, nj is stored

_N_o_t_e:

     The refdata code is currently R only (not implemented for S+). 
      Please note the following differences to matrices and dataframes: 

     '_x[]' you need to write 'x[]' in order to get the data

     '_d_r_o_p=_F_A_L_S_E' by default drop=FALSE which gives consistent
          behaviour for matrices and data.frames. You can use the $- or
          [[-operator to extract single column vectors which are
          granted to be of a consistent data type. However, currently $
          and [[ are only wrappers to [. They might be performance
          tuned in later versions.

     '_x[_i]' single index subsetting is not defined, use 'x[][i]'
          instead, but beware of differences between matrices and
          dataframes

     '_x[_c_b_i_n_d()]' matrix index subsetting is not defined, use
          'x[][cbind(i, j)]' instead

     '_r_e_f=_T_R_U_E' parameter 'ref' needs to be used sensibly to exploit
          the advantages of refdata objects

_A_u_t_h_o_r(_s):

     Jens Oehlschlgel

_S_e_e _A_l_s_o:

     'Extract',  'matrix',  'data.frame', 'optimal.index', 'ref'

_E_x_a_m_p_l_e_s:

       ## Simple usage Example
       x <- cbind(1:5, 5:1)            # take a matrix or data frame
       rx <- refdata(x)                # wrap it into an refdata object
       rx                              # see the autoprinting
       rm(x)                           # delete original to save memory
       rx[]                            # extract all data
       rx[-1, ]                        # extract part of data
       rx2 <- rx[-1, , ref=TRUE]       # create refdata object referencing part of data (only index, no data is duplicated)
       rx2                             # compare autoprinting
       rx2[]                           # extract 'all' data
       rx2[-1, ]                       # extract part of (part of) data
       cat("for more examples look the help pages\n")

      ## Not run: 
       # Memory saving demos
       square.matrix.size <- 1000
       recursion.depth.limit <- 10
       non.referenced.matrix <- matrix(1:(square.matrix.size*square.matrix.size), nrow=square.matrix.size, ncol=square.matrix.size)
       rownames(non.referenced.matrix) <- paste("a", seq(length=square.matrix.size), sep="")
       colnames(non.referenced.matrix) <- paste("b", seq(length=square.matrix.size), sep="")
       referenced.matrix <- refdata(non.referenced.matrix)
       recurse.nonref <- function(m, depth.limit=10){
         x <- m[1,1]   # need read access here to create local copy
         gc()
         cat("depth.limit=", depth.limit, "  memory.size=", memsize.wrapper(), "\n", sep="")
         if (depth.limit)
           Recall(m[-1, -1, drop=FALSE], depth.limit=depth.limit-1)
         invisible()
       }
       recurse.ref <- function(m, depth.limit=10){
         x <- m[1,1]   # read access, otherwise nothing happens
         gc()
         cat("depth.limit=", depth.limit, "  memory.size=",  memsize.wrapper(), "\n", sep="")
         if (depth.limit)
           Recall(m[-1, -1, ref=TRUE], depth.limit=depth.limit-1)
         invisible()
       }
       gc()
       memsize.wrapper()
       recurse.ref(referenced.matrix, recursion.depth.limit)
       gc()
        memsize.wrapper()
       recurse.nonref(non.referenced.matrix, recursion.depth.limit)
       gc()
        memsize.wrapper()
       rm(recurse.nonref, recurse.ref, non.referenced.matrix, referenced.matrix, square.matrix.size, recursion.depth.limit)
       
     ## End(Not run)
       cat("for even more examples look at regression.test.refdata()\n")
       regression.test.refdata()  # testing correctness of refdata functionality

