cape.dkit.xlsfile: Excel spreadsheet data interface¶
This module provides a class XLSFile for reading and writing
tabular data from Excel-like spreadsheets with the file extension
.xls or .xlsx. It relies on two third-party libraries readily
available from the Python Package Index (PyPI):
xlrdfor reading spreadsheets
xlsxwriterfor writing them
These can be readily installed on any machine with both Python and access to the internet (even without elevated privileges) using the commands below:
$ pip install --user xlrd $ pip install --user xlsxwriter
Because CAPE may also be used on machines without regular access to the
internet, this module does not raise an ImportError in the case that
these third-party modules are not available. However, the module will
provide no functionality if these modules are not available.
- class cape.dkit.xlsfile.XLSFile(fname=None, sheet=None, **kw)¶
Class for reading
.xlsand.xlsxfiles- Call:
>>> db = XLSFile(fname, sheet=0, **kw) >>> db = XLSFile(wb, sheet=0, **kw) >>> db = XLSFile(ws, **kw)
- Inputs:
- Outputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- db.cols:
list[str] List of columns read
- db.opts:
dict Options for this interface
- db[col]:
np.ndarray|list Numeric array or list of strings for each column
- db:
- Versions:
2019-12-12
@ddalle: First version
- get_autoskip(ws, **kw)¶
Automatically determine number of rows and columns to skip
- Call:
>>> wsopts = db.get_autoskip(ws, **kw)
- Inputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- ws:
xlrd.sheet.Sheet Direct access to a worksheet
- skiprows: {
None} |int>= 0 Number of rows to skip before reading data
- subcols: {
0} |int>= 0 Number of cols to skip right of first col (for arrays)
- subrows: {
0} |int>= 0 Number of rows below header row to skip
- skipcols: {
None} |int>= 0 Number of columns to skip before first data column
- maxcols: {
None} |int> skipcols Maximum column number of data
- db:
- Outputs:
- wsopts:
XLSSheetOpts Options from db.opts_by_sheet[ws.name],
- wsopts:
- Versions:
2019-12-26
@ddalle: Split fromread_xls_header()2020-01-14
@ddalle: Moved everything to smaller funcs2020-02-07
@ddalle: UsingXLSSheetOpts
- get_worksheet_opts(sheet, **kw)¶
Get or create specific options for each worksheet
- Call:
>>> opts = db.get_worksheet_opts(sheet, **kw)
- Inputs:
- Outputs:
- opts:
XLSFileOpts| db._optscls Combined options from db.opts and kw
- opts:
- Effects:
- db.opts_by_sheet[sheet]:
XLSFileOpts Set to opts
- db.opts_by_sheet[sheet]:
- Versions:
2020-02-06
@ddalle: First version
- read_xls(fname, sheet=None, **kw)¶
Read an
.xlsor.xlsxfile- Call:
>>> db.read_xls(fname, sheet, **kw) >>> db.read_xls(wb, sheet, **kw) >>> db.read_xls(wb, **kw) >>> db.read_xls(ws, **kw)
- Inputs:
- Versions:
2019-12-12
@ddalle: First version2019-12-26
@ddalle: Support “array” worksheets2020-02-07
@ddalle: UsingXLSSheetOpts
- read_xls_coldata(ws, cols, **kw)¶
Read column data from one
.xlsor.xlsxworksheet- Call:
>>> db.read_xls_coldata(ws, cols, **kw)
- Inputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- ws:
xlrd.sheet.Sheet Direct access to a worksheet
- cols:
list[str] List of column names to process
- skiprows: {
None} |int>= 0 Number of rows to skip before reading data
- subrows: {
0} |int> 0 Number of rows below header row to skip
- skipcols: {
None} |int>= 0 Number of columns to skip before first data column
- maxcols: {
None} |int> skipcols Maximum column number of data
- db:
- Effects:
- db[col]:
list|np.ndarray Column of data read from each column
- db._n[col]:
int Length of each column
- db[col]:
- Versions:
2019-12-12
@ddalle: First version
- read_xls_header(ws, **kw)¶
Read header row from a worksheet
- Call:
>>> cols = db.read_xls_header(ws, **kw)
- Inputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- ws:
xlrd.sheet.Sheet Direct access to a worksheet
- colspec: {
None} |list List of column names col or
(col, colwidth); using this option preempts reading column names from header row- skiprows: {
None} |int>= 0 Number of rows to skip before reading data
- subrows: {
None} |int>= 0 Number of rows below header row to skip
- skipcols: {
None} |int>= 0 Number of columns to skip before first data column
- maxcols: {
None} |int> skipcols Maximum column number of data
- db:
- Outputs:
- Versions:
2019-12-12
@ddalle: First version2020-01-16
@ddalle: Full scalar/array support2020-02-07
@ddalle: UsingXLSSheetOpts
- read_xls_workbook(wb, **kw)¶
Read
.xlsor.xlsxworkbook with multiple worksheets- Call:
>>> db.read_xls_workbook(wb, **kw)
- Inputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- wb:
xlrd.Book Direct access to a workbook
- ndim: {
0} |1 Dimensionality of one row of data column(s) to read
- skiprows: {
None} |int>= 0 Number of rows to skip before reading data
- subrows: {
None} |int>= 0 Number of rows below header row to skip
- skipcols: {
None} |int>= 0 Number of columns to skip before first data column
- maxrows: {
None} |int> skiprows Maximum row number of data
- maxcols: {
None} |int> skipcols Maximum column number of data
- db:
- Versions:
2020-01-08
@jmeeroff: First version2020-02-07
@ddalle: UsingXLSSheetOpts
- read_xls_worksheet(ws, **kw)¶
Read one worksheet of an
.xlsor.xlsxfile- Call:
>>> db.read_xls_worksheet(ws, **kw)
- Inputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- ws:
xlrd.sheet.Sheet Direct access to a worksheet
- db:
- Versions:
2019-12-12
@ddalle: First version2019-12-26
@ddalle: Support “array” worksheets2020-01-10
@jmeeroff: Array read as fallback2020-01-16
@ddalle: Unified two worksheet methods2020-02-07
@ddalle: UsingXLSSheetOpts
- write_xls(fname, cols=None, **kw)¶
Write data to one or more worksheets
- Call:
>>> db.write_xls(fname, cols=None, **kw) >>> db.write_xls(wb, cols=None, **kw)
- Inputs:
- db:
cape.dkit.ftypes.xlsfile.XLSFile XLS file interface
- fname: {db.fname} |
str Name of XLS file to create
- wb:
xlsxwriter.workbook.Workbook Workbook (write) interface
- cols:
list[str] List of data columns to write
- SheetNames | sheets:
list[str] List of worksheets to write
- SheetCols | sheetcols:
dict[list] List of columns to write to each worksheet
- SheetWriters:
dict[callable] Handles to standalone writers for some worksheets
- SheetPreWriters:
dict[callable] Functions to write some things to worksheet before also writing data columns
- SheetPostWriters:
dict[callable] Functions to write to worksheet after writing data
- SheetWritersSelfArg:
list|set List/set of sheets whose Writers take db as an arg
- SheetWritersWorksheetArg:
list|set List/set of sheets whose Writers take worksheet instead of workbook as arg
- SkipCols: {
0} |int> 0 Number of columns to skip before writing
- SkipRows: {
0} |int> 0 Number of rows to skip before writing
- ColMasks: {
{}} |dict Masks (subset indices) for some cols
- Translators: {
{}} |dict Column name translators (applied in reverse)
- Prefix: {
None} |dict|str Prefix (applied in reverse)
- Suffix: {
None} |dict|str Suffix (applied in reverse)
- db:
- Versions:
2020-05-21
@ddalle: First version
- class cape.dkit.xlsfile.XLSFileDefn(_optsdict=None, _warnmode=1, **kw)¶
- class cape.dkit.xlsfile.XLSFileOpts(_optsdict=None, _warnmode=1, **kw)¶
- class cape.dkit.xlsfile.XLSSheetOpts(_optsdict=None, _warnmode=1, **kw)¶