| cbc.read.table {colbycol} | R Documentation |
cbc.read.table is able to read a huge text data file well beyond the memory restrictions imposed by read.table.
It reads the file line by line, breaks it into as many physical files as columns, reads them back into R one by one and saves them in efficient, native R data files.
cbc.read.table(file, tmp.dir = tempfile( pattern = "dir" ), sep = "\t", header = TRUE, ...)
file |
file to be loaded |
tmp.dir |
path to the (empty) directory where temporary files are stored |
sep |
field separator for the data in file |
header |
whether file contains headers or not |
... |
other parameters passed to read.table internally |
This function invokes a python script which reads file line by line, breaks each of them into tokens as indicated by sep, and stores each column in an independent text file in tmp.dir.
These files are then read into R one by one and the text files are replaced by R native data files stored with save.
The function returns a tiny object containing the required metadata, while the data sits in tmp.dir.
It is convenient to provide the full path to tmp.dir in case the working directory is modified; otherwise, the temporary files could not be found.
If no temporary directory is provided, a temporal one is created that will be erased as the R session ends.
Caution is required to pass extra arguments to the internal calls to read.table via ....
An object of class colbycol containing the metadata required to access the data from the original file that is stored in tmp.dir.
Carlos J. Gil Bellosta
None
cbc.data <- cbc.read.table( system.file("data", "cbc.test.data.txt", package = "colbycol"), sep = "\t" )
nrow( cbc.data )
colnames( cbc.data )
col.01 <- cbc.get.col( cbc.data, 1)
col.02 <- cbc.get.col( cbc.data, "col02" )