`cape.dkit.xlsfile`: Excel spreadsheet data interface¶

This module provides a class XLSFile for reading and writing tabular data from Excel-like spreadsheets with the file extension .xls or .xlsx. It relies on two third-party libraries readily available from the Python Package Index (PyPI):

xlrd for reading spreadsheets

xlsxwriter for writing them

These can be readily installed on any machine with both Python and access to the internet (even without elevated privileges) using the commands below:

$ pip install --user xlrd
$ pip install --user xlsxwriter

Because CAPE may also be used on machines without regular access to the internet, this module does not raise an ImportError in the case that these third-party modules are not available. However, the module will provide no functionality if these modules are not available.

class cape.dkit.xlsfile.XLSFile(fname=None, sheet=None, **kw)¶

Class for reading .xls and .xlsx files

Call:

>>> db = XLSFile(fname, sheet=0, **kw)
>>> db = XLSFile(wb, sheet=0, **kw)
>>> db = XLSFile(ws, **kw)

Inputs:

fname: str: Name of .xls or .xlsx file to read
sheet: {0} | int | str: Worksheet name or number
wb: xlrd.book.Book: Open workbook (spreadsheet file)
ws: xlrd.sheet.Sheet: Direct access to a worksheet

Outputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
db.cols: list[str]: List of columns read
db.opts: dict: Options for this interface
db[col]: np.ndarray | list: Numeric array or list of strings for each column

Versions:

2019-12-12 @ddalle: First version

get_autoskip(ws, **kw)¶

Automatically determine number of rows and columns to skip

Call:

>>> wsopts = db.get_autoskip(ws, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
ws: xlrd.sheet.Sheet: Direct access to a worksheet
skiprows: {None} | int >= 0: Number of rows to skip before reading data
subcols: {0} | int >= 0: Number of cols to skip right of first col (for arrays)
subrows: {0} | int >= 0: Number of rows below header row to skip
skipcols: {None} | int >= 0: Number of columns to skip before first data column
maxcols: {None} | int > skipcols: Maximum column number of data

Outputs:

wsopts: XLSSheetOpts: Options from db.opts_by_sheet[ws.name],

Versions:

2019-12-26 @ddalle: Split from read_xls_header()
2020-01-14 @ddalle: Moved everything to smaller funcs
2020-02-07 @ddalle: Using XLSSheetOpts

get_worksheet_opts(sheet, **kw)¶

Get or create specific options for each worksheet

Call:

>>> opts = db.get_worksheet_opts(sheet, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
sheet: str: Name of worksheet in question
kw: dict: Keyword options valid to db._optscls

Outputs:

opts: XLSFileOpts | db._optscls: Combined options from db.opts and kw

Effects:

db.opts_by_sheet[sheet]: XLSFileOpts: Set to opts

Versions:

2020-02-06 @ddalle: First version

read_xls(fname, sheet=None, **kw)¶

Read an .xls or .xlsx file

Call:

>>> db.read_xls(fname, sheet, **kw)
>>> db.read_xls(wb, sheet, **kw)
>>> db.read_xls(wb, **kw)
>>> db.read_xls(ws, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
fname: str: Name of .xls or .xlsx file to read
sheet: {0} | int | str: Worksheet name or number
wb: xlrd.book.Book: Open workbook (spreadsheet file)
ws: xlrd.sheet.Sheet: Direct access to a worksheet

Versions:

2019-12-12 @ddalle: First version
2019-12-26 @ddalle: Support “array” worksheets
2020-02-07 @ddalle: Using XLSSheetOpts

read_xls_coldata(ws, cols, **kw)¶

Read column data from one .xls or .xlsx worksheet

Call:

>>> db.read_xls_coldata(ws, cols, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
ws: xlrd.sheet.Sheet: Direct access to a worksheet
cols: list[str]: List of column names to process
skiprows: {None} | int >= 0: Number of rows to skip before reading data
subrows: {0} | int > 0: Number of rows below header row to skip
skipcols: {None} | int >= 0: Number of columns to skip before first data column
maxcols: {None} | int > skipcols: Maximum column number of data

Effects:

db[col]: list | np.ndarray: Column of data read from each column
db._n[col]: int: Length of each column

Versions:

2019-12-12 @ddalle: First version

read_xls_header(ws, **kw)¶

Read header row from a worksheet

Call:

>>> cols = db.read_xls_header(ws, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
ws: xlrd.sheet.Sheet: Direct access to a worksheet
colspec: {None} | list: List of column names col or (col, colwidth); using this option preempts reading column names from header row
skiprows: {None} | int >= 0: Number of rows to skip before reading data
subrows: {None} | int >= 0: Number of rows below header row to skip
skipcols: {None} | int >= 0: Number of columns to skip before first data column
maxcols: {None} | int > skipcols: Maximum column number of data

Outputs:

cols: list[str]: List of column names if read

Versions:

2019-12-12 @ddalle: First version
2020-01-16 @ddalle: Full scalar/array support
2020-02-07 @ddalle: Using XLSSheetOpts

read_xls_workbook(wb, **kw)¶

Read .xls or .xlsx workbook with multiple worksheets

Call:

>>> db.read_xls_workbook(wb, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
wb: xlrd.Book: Direct access to a workbook
ndim: {0} | 1: Dimensionality of one row of data column(s) to read
skiprows: {None} | int >= 0: Number of rows to skip before reading data
subrows: {None} | int >= 0: Number of rows below header row to skip
skipcols: {None} | int >= 0: Number of columns to skip before first data column
maxrows: {None} | int > skiprows: Maximum row number of data
maxcols: {None} | int > skipcols: Maximum column number of data

Versions:

2020-01-08 @jmeeroff : First version
2020-02-07 @ddalle: Using XLSSheetOpts

read_xls_worksheet(ws, **kw)¶

Read one worksheet of an .xls or .xlsx file

Call:

>>> db.read_xls_worksheet(ws, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
ws: xlrd.sheet.Sheet: Direct access to a worksheet

Versions:

2019-12-12 @ddalle: First version
2019-12-26 @ddalle: Support “array” worksheets
2020-01-10 @jmeeroff: Array read as fallback
2020-01-16 @ddalle: Unified two worksheet methods
2020-02-07 @ddalle: Using XLSSheetOpts

write_xls(fname, cols=None, **kw)¶

Write data to one or more worksheets

Call:

>>> db.write_xls(fname, cols=None, **kw)
>>> db.write_xls(wb, cols=None, **kw)

Inputs:

db: cape.dkit.ftypes.xlsfile.XLSFile: XLS file interface
fname: {db.fname} | str: Name of XLS file to create
wb: xlsxwriter.workbook.Workbook: Workbook (write) interface
cols: list[str]: List of data columns to write
SheetNames | sheets: list[str]: List of worksheets to write
SheetCols | sheetcols: dict[list]: List of columns to write to each worksheet
SheetWriters: dict[callable]: Handles to standalone writers for some worksheets
SheetPreWriters: dict[callable]: Functions to write some things to worksheet before also writing data columns
SheetPostWriters: dict[callable]: Functions to write to worksheet after writing data
SheetWritersSelfArg: list | set: List/set of sheets whose Writers take db as an arg
SheetWritersWorksheetArg: list | set: List/set of sheets whose Writers take worksheet instead of workbook as arg
SkipCols: {0} | int > 0: Number of columns to skip before writing
SkipRows: {0} | int > 0: Number of rows to skip before writing
ColMasks: {{}} | dict: Masks (subset indices) for some cols
Translators: {{}} | dict: Column name translators (applied in reverse)
Prefix: {None} | dict | str: Prefix (applied in reverse)
Suffix: {None} | dict | str: Suffix (applied in reverse)

Versions:

2020-05-21 @ddalle: First version

class cape.dkit.xlsfile.XLSFileDefn(_optsdict=None, _warnmode=1, **kw)¶

class cape.dkit.xlsfile.XLSFileOpts(_optsdict=None, _warnmode=1, **kw)¶

class cape.dkit.xlsfile.XLSSheetOpts(_optsdict=None, _warnmode=1, **kw)¶

`cape.dkit.xlsfile`: Excel spreadsheet data interface¶

Previous topic

Next topic

This Page

cape.dkit.xlsfile: Excel spreadsheet data interface¶

`cape.dkit.xlsfile`: Excel spreadsheet data interface¶