cape.attdb.ftypes.textdata: Generic textual data interface

This module contains a basic interface in the spirit of cape.attdb.ftypes for standard text data files. It creates a class, TextDataFile that does not rely on the popular numpy.loadtxt() function and supports a more capabilities than the cape.attdb.ftypes.csv.CSVFile class.

For example, the TextDataFile class supports a variety of delimiters, whereas a CSVFile instance must use ',' as the delimiter. The TextDataFile class also remembers its text

If possible, the column names (which become keys in the dict-like class) are read from the header row. If the file begins with multiple comment lines, the column names are read from the final comment before the beginning of data.

class cape.attdb.ftypes.textdata.TextDataDefn(_optsdict=None, _warnmode=1, **kw)
class cape.attdb.ftypes.textdata.TextDataFile(fname=None, **kw)

Interface to generic data text files

Call:
>>> db = TextDataFile(fname=None, **kw)
Inputs:
fname: str

Name of file to read

delim, Delimiter: {", "} | str

Delimiter(s) option

Outputs:
db: cape.attdb.ftypes.textdata.TextDatafile

Text data file interface

db.cols: list[str]

List of columns read

db.lines: list[str]

Lines of text from the file that was read

db.opts: TextdataOpts

Options for this instance

db.defns: dict[TextDataDefn

Definitions for each column

db[col]: np.ndarray | list

Numeric array or list of strings for each column

Versions:
  • 2019-12-02 @ddalle: v1.0

finish_defns()

Process Definitions of column types

Call:
>>> db.finish_defns(**kw)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Data file interface

Versions:
  • 2014-06-05 @ddalle: v1.0

  • 2014-06-17 @ddalle: Read from defns dict

  • 2019-11-12 @ddalle: Forked from RunMatrix

  • 2020-02-06 @ddalle: Using self.opts

fromtext_boolmap(txt, col)

Convert boolean flag text to dictionary

Call:
>>> v, vmap = db.fromtext_boolmap(txt, col)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

txt: str

Text to be converted to float

clsname: {"float64"} | "int32" | str

Valid data type name

col: str

Name of flag column, for "boolmap" keys

Outputs:
txt: str

Text returned

vmap: dict[True | False]

Flags for each flag in col definition

Versions:
  • 2019-12-02 @ddalle: v1.0

fromtext_val(txt, clsname, col=None)

Convert a string to appropriate type

Call:
>>> v = db.fromtext_val(txt, clsname, col)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

txt: str

Text to be converted to float

clsname: {"float64"} | "int32" | str

Valid data type name

col: str

Name of flag column, for "boolmap" keys

Outputs:
v: clsname

Text translated to requested type

Versions:
  • 2019-12-02 @ddalle: v1.0

process_defns_boolmap(col, bmap)

Process definitions for columns of type BoolMap

Call:
>>> db.process_defns_boolmap(col, bmap)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Data file interface

col: str

Name of column with type "BollMap"

bmap: dict

Map for abbreviations that set boolean columns

See Also:
Versions:
  • 2019-12-03 @ddalle: v1.0

read_textdata(fname)

Read an entire text data file

Call:
>>> db.read_textdata(fname)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

fname: str

Name of file to read

See Also:
Versions:
  • 2019-12-02 @ddalle: v1.0

read_textdata_data(f)

Read data portion of text data file

Call:
>>> db.read_textdata_data(f)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

f: file

Open file handle

Effects:
db.cols: list[str]

List of column names

Versions:
  • 2019-11-25 @ddalle: v1.0

read_textdata_firstrowtypes(f)

Get initial guess at data types from first data row

If (and only if) the DefaultType input is an integer type, guessed types can be integers. Otherwise the sequence of possibilities is float, complex, str.

Call:
>>> db.read_textdata_firstrowtypes(f, **kw)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

f: file

Open file handle

DefaultType: {"float"} | str

Name of default class

Versions:
  • 2019-11-25 @ddalle: v1.0

  • 2019-12-02 @ddalle: Copied from CSVFile

read_textdata_header(f)

Read column names from beginning of open file

Call:
>>> db.read_textdata_header(f)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

f: file

Open file handle

Effects:
db.cols: list[str]

List of column names

Versions:
  • 2019-11-12 @ddalle: v1.0

read_textdata_headerdefaultcols(f)

Create column names “col1”, “col2”, etc. if needed

Call:
>>> db.read_textdata_headerdefaultcols(f)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

f: file

Open file handle

Effects:
db.cols: list[str]

If not previously determined, this becomes ["col1", "col2", ...] based on number of columns in the first data row

Versions:
  • 2019-11-27 @ddalle: v1.0

  • 2019-12-02 @ddalle: Copied from CSVFile

read_textdata_headerline(f)

Read line and process column names if possible

Call:
>>> db.read_textdata_headerline(f)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

f: file

Open file handle

Effects:
db.cols: None | list[str]

List of column names if read

db._textdata_header_once: True | False

Set to True if column names are read at all

db._textdata_header_complete: True | False

Set to True if next line is expected to be data

Versions:
  • 2019-11-22 @ddalle: v1.0

  • 2019-12-02 @ddalle: Copied from CSVFile

read_textdata_line(f)

Read a data row from a text data file

Call:
>>> db.read_textdata_line(f)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

f: file

Open file handle

Versions:
  • 2019-11-25 @ddalle: v1.0

set_regex_linesplitter()

Generate regular expression used to split a line

Call:
>>> db.set_regex_linesplitter()
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

Effects:
db.regex_linesplit: re.SRE_Pattern

Compiled regular expression object

Versions:
  • 2019-12-02 @ddalle: v1.0

split_textdata_line(line)

Split a line into its parts

Splits line of text by specified delimiter and strips whitespace and delimiter from each entry

Call:
>>> parts = db.split_textdata_line(line)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

line: str

Line of text to be split

Outputs:
parts: list[str]

List of strings

Versions:
  • 2019-12-02 @ddalle: v1.0

  • 2024-01-10 @ddalle: v1.1; allow whitespace in cols

validate_boolmap(boolmap)

Translate free-form Type option into validated code

Call:
>>> bmap = db.validate_boolmap(boolmap)
Inputs:
db: cape.attdb.ftypes.textdata.TextData

Data file interface

boolmap: str[str | list]

Initial boolean flag map; the keys are names of the boolean coefficients that are set, and the item values are the one or more abbreviations for each key

Outputs:
bmap: str[list[str]]

Validated map

Versions:
  • 2019-12-03 @ddalle: v1.0

write_textdata(fname=None)

Write text data file based on existing db.lines

Checks are not performed that values in e.g. db[col] have been synchronized with the text in db.lines. It is therefore possible to write a file that does not match the values in the database. To avoid this, use set_colval().

Call:
>>> db.write_textdata()
>>> db.write_textdata(fname)
Inputs:
db: cape.attdb.ftypes.textdata.TextDataFile

Text data file interface

fname: {db.fname} | str

Name of file to write

Versions:
  • 2019-12-04 @ddalle: v1.0

class cape.attdb.ftypes.textdata.TextDataOpts(_optsdict=None, _warnmode=1, **kw)