| read.gnumeric.sheet {gnumeric} | R Documentation |
Read data from a sheet of a gnumeric (or other common spreadsheet or database) file to a data.frame.
Requires an external program, ssconvert (normally installed with gnumeric (http://projects.gnome.org/gnumeric/) in PATH.
Calls ssconvert to convert the input to CSV. ssconvert can read several file formats (see Details below).
Note: During conversion to CSV ssconvert also evaluates formulas (e.g. =sum(A1:A3)) in cells, and emits the result instead of the formula.
df<-read.gnumeric.sheet(file,
head=FALSE,
sheet.name='Sheet1',
top.left='A1',
bottom.right='IV65536',
drop.empty.rows="bottom",
drop.empty.columns="right",
colnames.as.sheet=FALSE,
rownames.as.sheet=colnames.as.sheet,
quiet=TRUE,
LANG='C',
import.encoding=NA,
...
)
file |
Name of gnumeric file (or other file type readable by
gnumeric) to read from.
This may also be an URL, i.e. like 'http://example.com/path/file.gnumeric'
|
head |
When TRUE, use first row of requested gnumeric sheet
range as column names in the resulting data.frame |
sheet.name |
Name of sheet as appears in gnumeric. Sheet names containing space character do not work. sheet.name=NA Omits sheet name from the ssconvert command line.For gnumeric files this will read the sheet that was 'current' in gnumeric when the file was saved. |
top.left |
Top left corner of requested gnumeric sheet
range, e.g. 'A1' |
bottom.right |
Bottom right corner of requested gnumeric sheet
range. The default for read.gnumeric.sheet is 'IV65536': this
causes a lot of unused lines to be printed by ssconvert then
parsed by read.csv, thus you might want to override it to
speed up reading.
Use read.gnumeric.sheet.info to read actual
bottom.right cell name from a gnumeric file (but not other formats).
|
drop.empty.rows |
One of
c('none','top','bottom','both','all').
'all' drops all empty lines from the requested range, even
those that are between two non-empty rows.
'both' drops empty lines below the last non-empty row and
above the first non-empty.
'top', 'bottom' and 'none' as you would expect.
|
drop.empty.columns |
One of
c('none','left','right','both','all')Similar to drop.empty.rows, but for columns.
|
colnames.as.sheet |
Rename columns to 'A', 'B', 'C', ... to
have names corresponding to gnumeric column names. |
rownames.as.sheet |
Rename rows to '1', '2', '3', ... to have
names corresponding to gnumeric row indices. Note: this means
df['1',], not df[1,] in the result (rownames are
strings, not integers). Note: when deciding row names only
top.left and head are accounted for, but not
e.g. skip (which may be passed to read.csv via ...).
|
quiet |
When TRUE, do not print command executed, and (on unix platforms) also redirect stderr of the external program ssconvert to /dev/null |
LANG |
Under unix, passed to ssconvert in the environment
variable 'LANG'. The default value ('C') is intended to avoid
using decimal comma in the emitted CSV file. |
import.encoding |
If not NA, passed to ssconvert as its
--import-encoding parameter.
|
... |
Extra arguments, passed to read.csv |
Data from the gnumeric file is dumped as .csv using the ssconvert program provided with gnumeric.
ssconvert supports several input formats, thus the input file does not have to be a gnumeric file. The formats supported may be listed with
ssconvert --list-importers
from a shell prompt.
For me this prints (with ssconvert version '1.8.4')
ID | Description
Gnumeric_xbase:xbase | Xbase (*.dbf) file format
Gnumeric_Excel:excel | MS Excel (tm) (*.xls)
Gnumeric_Excel:xlsx | MS Excel (tm) 2007
Gnumeric_html:html | HTML (*.html, *.htm)
Gnumeric_oleo:oleo | GNU Oleo (*.oleo)
Gnumeric_applix:applix | Applix (*.as)
Gnumeric_QPro:qpro | Quattro Pro (*.wb1, *.wb2, *.wb3)
Gnumeric_paradox:paradox | Paradox database or
| primary index file
Gnumeric_sc:sc | SC/xspread
Gnumeric_XmlIO:sax | Gnumeric XML (*.gnumeric)
Gnumeric_lotus:lotus | Lotus 123 (*.wk1, *.wks, *.123)
Gnumeric_XmlIO:dom | Gnumeric XML (*.gnumeric) Old
| slow importer
Gnumeric_dif:dif | Data Interchange Format (*.dif)
Gnumeric_Excel:excel_xml | MS Excel (tm) 2003 SpreadsheetML
Gnumeric_OpenCalc:openoffice | Open/Star Calc (*.sxc, *.ods)
Gnumeric_plan_perfect:pln | Plan Perfect Format (PLN) import
Gnumeric_sylk:sylk | MultiPlan (SYLK)
Gnumeric_mps:mps | Linear and integer program (*.mps)
| file format
Gnumeric_stf:stf_csvtab | Comma or tab separated
| values (CSV/TSV)
Gnumeric_stf:stf_assistant | Text import (configurable)
But the actual list may depend on which import plugins are installed for gnumeric.
| Format | Source | Status |
| .gnumeric | gnumeric | works |
| .xls | gnumeric | works |
| .html | gnumeric [Save as / HTML 4.0] | works |
| .html | Openoffice Calc [Save as/HTML Document] | works |
| .ods | Openoffice Calc | might work[1] |
| Other formats | not tested |
[1] during .ods import ssconvert emits some messages on stdout (corrupts output). Worked around on unix with | grep , on the output. (The problem has been corrected in libgsf on 2009-08-20, thus the workaround is expected to be removable by 2010)
read.gnumeric.range for a variant with default
arguments more suited for reading an exact cell range of a sheet.
read.gnumeric.sheet.info to read actual
bottom.right cell name from a gnumeric file (but not other formats).
read.gnumeric.sheets to read all sheets
from a gnumeric file (but not other formats).
read.xls for reading Microsoft Excel files
(possibly from a http:// URL)
read.DIF for reading Data Interchange Format (DIF)
files.
read.dbf for Xbase (.dbf) files.
## Read all data from 'Sheet1'
## Not run:
df <- read.gnumeric.sheet( file="file.gnumeric" );
df <- read.gnumeric.sheet( file="file.gnumeric",
sheet.name='Sheet1' );
## Read from Excel sheet named 'Sheet3' the range C3:D50,
## rename columns to 'C' and 'D', rows to '3' ... '50',
## then drop all empty rows.
##
df<-read.gnumeric.sheet( "file.xls",
sheet.name='Sheet3',
top.left='C3',
bottom.right='D50',
drop.empty.rows="all",
drop.empty.columns="none",
colnames.as.sheet=TRUE
)
## Read from "file.gnumeric", 'Sheet1' data in 'A1:E100',
## Use first row (of selected range) as column names.
## Drop empty rows and columns from bottom and right.
df<-read.gnumeric.sheet("file.gnumeric", head=TRUE,
bottom.right='E100')
## Why does it not work? Set quiet=FALSE to see
## the command executed (and on unix, diagnostic
## messages from ssconvert).
df<-read.gnumeric.sheet( "file.ods", quiet=FALSE )
## End(Not run)