| readPDF {tm} | R Documentation |
Returns a function which reads in a portable document format (PDF) document extracting both its text and its meta data.
readPDF(PdfinfoOptions = "", PdftotextOptions = "", ...)
PdfinfoOptions |
options passed over to pdfinfo. |
PdftotextOptions |
options passed over to pdftotext. |
... |
arguments for the generator function. |
Formally this function is a function generator, i.e., it returns a
function (which reads in a text document) with a well-defined
signature, but can access passed over arguments (e.g., options to
pdfinfo or pdftotext) via lexical scoping.
Note that this PDF reader needs both the tools pdftotext and
pdfinfo installed and accessable on your system.
A function with the signature elem, language, id:
elem |
A list with the two named elements content
and uri. The first element must hold the document to
be read in, the second element must hold a call to extract this
document. The call is evaluated upon a request for load on demand. |
language |
A character vector giving the text's language. |
id |
A character vector representing a unique identification
string for the returned text document. |
The function returns a PlainTextDocument representing the text
and meta data in content.
Ingo Feinerer
Use getReaders to list available reader functions.
f <- system.file("texts", "pdf", "pdfarchiving.pdf", package = "tm")
pdf <- readPDF(PdftotextOptions = "-layout")(elem = list(uri = f), language = "eng", id = "id1")
meta(pdf)