| TextDocCol {tm} | R Documentation |
Constructs a text document collection.
## S4 method for signature 'Source': TextDocCol(object, readerControl = list(reader = object@DefaultReader, language = "en_US", load = FALSE), dbControl = list(useDb = FALSE, dbName = "", dbType = "DB1"), ...)
object |
a Source object. |
readerControl |
a list with the named components reader
representing a reading function capable of handling the file format
found in object, language giving the text's language, and
load being a logical value indicating whether the text corpus of
documents should be loaded immediately into memory (load = TRUE) or loaded when
necessary (load = FALSE). This allows to minimize memory
demands for large document collections. If object does not
support load on demand the text corpus is automatically loaded,
i.e., this argument is overruled. |
dbControl |
a list with the named components useDb
indicating that database support should be activated, dbName
giving the filename holding the sourced out objects (i.e., the
database), and dbType holding a valid database type as
supported by filehash. Under activated database
support the tm packages tries to keep as few as possible
resources in memory under usage of the database. |
... |
optional arguments for the reader. |
An S4 object of class TextDocCol which extends the class
list containing a collection of text documents.
Ingo Feinerer
txt <- system.file("texts", "txt", package = "tm")
## Not run:
(TextDocCol(DirSource(txt), readerControl = list(reader
= readPlain, language = "en_US", load = TRUE), dbControl = list(useDb =
TRUE, dbName = "oviddb", dbType = "DB1")))
## End(Not run)
reut21578 <- system.file("texts", "reut21578", package = "tm")
TextDocCol(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))