cape.attdb.ftypes.xlsfile: Excel spreadsheet data interface

This module provides a class XLSFile for reading and writing tabular data from Excel-like spreadsheets with the file extension .xls or .xlsx. It relies on two third-party libraries readily available from the Python Package Index (PyPI):

  • xlrd for reading spreadsheets

  • xlsxwriter for writing them

These can be readily installed on any machine with both Python and access to the internet (even without elevated privileges) using the commands below:

$ pip install --user xlrd
$ pip install --user xlsxwriter

Because CAPE may also be used on machines without regular access to the internet, this module does not raise an ImportError in the case that these third-party modules are not available. However, the module will provide no functionality if these modules are not available.

class cape.attdb.ftypes.xlsfile.XLSFile(fname=None, sheet=None, **kw)

Class for reading .xls and .xlsx files

Call:
>>> db = XLSFile(fname, sheet=0, **kw)
>>> db = XLSFile(wb, sheet=0, **kw)
>>> db = XLSFile(ws, **kw)
Inputs:
fname: str

Name of .xls or .xlsx file to read

sheet: {0} | int | str

Worksheet name or number

wb: xlrd.book.Book

Open workbook (spreadsheet file)

ws: xlrd.sheet.Sheet

Direct access to a worksheet

Outputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

db.cols: list[str]

List of columns read

db.opts: dict

Options for this interface

db[col]: np.ndarray | list

Numeric array or list of strings for each column

Versions:
  • 2019-12-12 @ddalle: First version

get_autoskip(ws, **kw)

Automatically determine number of rows and columns to skip

Call:
>>> wsopts = db.get_autoskip(ws, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

ws: xlrd.sheet.Sheet

Direct access to a worksheet

skiprows: {None} | int >= 0

Number of rows to skip before reading data

subcols: {0} | int >= 0

Number of cols to skip right of first col (for arrays)

subrows: {0} | int >= 0

Number of rows below header row to skip

skipcols: {None} | int >= 0

Number of columns to skip before first data column

maxcols: {None} | int > skipcols

Maximum column number of data

Outputs:
wsopts: XLSSheetOpts

Options from db.opts_by_sheet[ws.name],

Versions:
get_worksheet_opts(sheet, **kw)

Get or create specific options for each worksheet

Call:
>>> opts = db.get_worksheet_opts(sheet, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

sheet: str

Name of worksheet in question

kw: dict

Keyword options valid to db._optscls

Outputs:
opts: XLSFileOpts | db._optscls

Combined options from db.opts and kw

Effects:
db.opts_by_sheet[sheet]: XLSFileOpts

Set to opts

Versions:
  • 2020-02-06 @ddalle: First version

read_xls(fname, sheet=None, **kw)

Read an .xls or .xlsx file

Call:
>>> db.read_xls(fname, sheet, **kw)
>>> db.read_xls(wb, sheet, **kw)
>>> db.read_xls(wb, **kw)
>>> db.read_xls(ws, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

fname: str

Name of .xls or .xlsx file to read

sheet: {0} | int | str

Worksheet name or number

wb: xlrd.book.Book

Open workbook (spreadsheet file)

ws: xlrd.sheet.Sheet

Direct access to a worksheet

Versions:
  • 2019-12-12 @ddalle: First version

  • 2019-12-26 @ddalle: Support “array” worksheets

  • 2020-02-07 @ddalle: Using XLSSheetOpts

read_xls_coldata(ws, cols, **kw)

Read column data from one .xls or .xlsx worksheet

Call:
>>> db.read_xls_coldata(ws, cols, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

ws: xlrd.sheet.Sheet

Direct access to a worksheet

cols: list[str]

List of column names to process

skiprows: {None} | int >= 0

Number of rows to skip before reading data

subrows: {0} | int > 0

Number of rows below header row to skip

skipcols: {None} | int >= 0

Number of columns to skip before first data column

maxcols: {None} | int > skipcols

Maximum column number of data

Effects:
db[col]: list | np.ndarray

Column of data read from each column

db._n[col]: int

Length of each column

Versions:
  • 2019-12-12 @ddalle: First version

read_xls_header(ws, **kw)

Read header row from a worksheet

Call:
>>> cols = db.read_xls_header(ws, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

ws: xlrd.sheet.Sheet

Direct access to a worksheet

colspec: {None} | list

List of column names col or (col, colwidth); using this option preempts reading column names from header row

skiprows: {None} | int >= 0

Number of rows to skip before reading data

subrows: {None} | int >= 0

Number of rows below header row to skip

skipcols: {None} | int >= 0

Number of columns to skip before first data column

maxcols: {None} | int > skipcols

Maximum column number of data

Outputs:
cols: list[str]

List of column names if read

Versions:
  • 2019-12-12 @ddalle: First version

  • 2020-01-16 @ddalle: Full scalar/array support

  • 2020-02-07 @ddalle: Using XLSSheetOpts

read_xls_workbook(wb, **kw)

Read .xls or .xlsx workbook with multiple worksheets

Call:
>>> db.read_xls_workbook(wb, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

wb: xlrd.Book

Direct access to a workbook

ndim: {0} | 1

Dimensionality of one row of data column(s) to read

skiprows: {None} | int >= 0

Number of rows to skip before reading data

subrows: {None} | int >= 0

Number of rows below header row to skip

skipcols: {None} | int >= 0

Number of columns to skip before first data column

maxrows: {None} | int > skiprows

Maximum row number of data

maxcols: {None} | int > skipcols

Maximum column number of data

Versions:
  • 2020-01-08 @jmeeroff : First version

  • 2020-02-07 @ddalle: Using XLSSheetOpts

read_xls_worksheet(ws, **kw)

Read one worksheet of an .xls or .xlsx file

Call:
>>> db.read_xls_worksheet(ws, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

ws: xlrd.sheet.Sheet

Direct access to a worksheet

Versions:
  • 2019-12-12 @ddalle: First version

  • 2019-12-26 @ddalle: Support “array” worksheets

  • 2020-01-10 @jmeeroff: Array read as fallback

  • 2020-01-16 @ddalle: Unified two worksheet methods

  • 2020-02-07 @ddalle: Using XLSSheetOpts

write_xls(fname, cols=None, **kw)

Write data to one or more worksheets

Call:
>>> db.write_xls(fname, cols=None, **kw)
>>> db.write_xls(wb, cols=None, **kw)
Inputs:
db: cape.attdb.ftypes.xlsfile.XLSFile

XLS file interface

fname: {db.fname} | str

Name of XLS file to create

wb: xlsxwriter.workbook.Workbook

Workbook (write) interface

cols: list[str]

List of data columns to write

SheetNames | sheets: list[str]

List of worksheets to write

SheetCols | sheetcols: dict[list]

List of columns to write to each worksheet

SheetWriters: dict[callable]

Handles to standalone writers for some worksheets

SheetPreWriters: dict[callable]

Functions to write some things to worksheet before also writing data columns

SheetPostWriters: dict[callable]

Functions to write to worksheet after writing data

SheetWritersSelfArg: list | set

List/set of sheets whose Writers take db as an arg

SheetWritersWorksheetArg: list | set

List/set of sheets whose Writers take worksheet instead of workbook as arg

SkipCols: {0} | int > 0

Number of columns to skip before writing

SkipRows: {0} | int > 0

Number of rows to skip before writing

ColMasks: {{}} | dict

Masks (subset indices) for some cols

Translators: {{}} | dict

Column name translators (applied in reverse)

Prefix: {None} | dict | str

Prefix (applied in reverse)

Suffix: {None} | dict | str

Suffix (applied in reverse)

Versions:
  • 2020-05-21 @ddalle: First version

class cape.attdb.ftypes.xlsfile.XLSFileDefn(_optsdict=None, _warnmode=1, **kw)
class cape.attdb.ftypes.xlsfile.XLSFileOpts(_optsdict=None, _warnmode=1, **kw)
class cape.attdb.ftypes.xlsfile.XLSSheetOpts(_optsdict=None, _warnmode=1, **kw)