cape.dkit.datakitloader: DataKit collection tools

This class provides the DataKitLoader, which takes as input the module __name__ and __file__ to automatically determine a variety of DataKit parameters.

class cape.dkit.datakitloader.DataKitLoader(name: str | None = None, fname: str | None = None, DATAKIT_CLS: type | None = None, **kw)

Tool for reading datakits based on module name and file

Call:
>>> dkl = DataKitLoader(name, fname, **kw)
Inputs:
name: str

Module name, from __name__

fname: str

Absolute path to module file name, from __file__

Outputs:
dkl: DataKitLoader

Tool for reading datakits for a specific module

check_dvcfile(fname: str, f: bool = False) bool

Check if a file exists with appended``.dvc`` extension

Call:
>>> q = ast.check_dvcfile(fname)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file [optionally relative to MODULE_DIR]

Keys:
  • MODULE_DIR

Outputs:
q: True | False

Whether or not fname or DVC file exists

Versions:
  • 2021-07-19 @ddalle: v1.0

check_file(fname: str, f: bool = False, dvc: bool = True)

Check if a file exists OR a .dvc version

  • If f is True, this returns False always

  • If fabs exists, this returns True

  • If fabs plus .dvc exists, it also returns True

Call:
>>> q = ast.check_file(fname, f=False, dvc=True)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file [optionally relative to MODULE_DIR]

f: True | {False}

Force-overwrite option; always returns False

dvc: {True} | False

Option to check for .dvc extension

Keys:
  • MODULE_DIR

Outputs:
q: True | False

Whether or not fname or DVC file exists

Versions:
  • 2021-07-19 @ddalle: v1.0

check_modfile(fname: str) bool

Check if a file exists OR a .dvc version

Call:
>>> q = ast.check_modfile(fname)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file [optionally relative to MODULE_DIR]

Keys:
  • MODULE_DIR

Outputs:
q: True | False

Whether or not fname or DVC file exists

Versions:
  • 2021-07-19 @ddalle: v1.0

create_db_name()

Create and save database name from module name

This utilizes the following parameters:

  • MODULE_NAME_REGEX_LIST

  • MODULE_NAME_REGEX_GROUPS

  • DB_NAME_TEMPLATE_LIST

Call:
>>> dbname = ast.create_db_name()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
dbname: str

Prescribed datakit name

Versions:
  • 2021-06-28 @ddalle: v1.0

dvc_add(frel, **kw)

Add (cache) a file using DVC

Call:
>>> ierr = ast.dvc_add(frel, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to MODULE_DIR

Outputs:
ierr: int

Return code

  • 0: success

  • 512: not a git repo

Versions:
  • 2021-09-15 @ddalle: v1.0

dvc_pull(frel, **kw)

Pull a DVC file

Call:
>>> ierr = ast.dvc_pull(frel, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to MODULE_DIR

Outputs:
ierr: int

Return code

  • 0: success

  • 256: no DVC file

  • 512: not a git repo

Versions:
  • 2021-07-19 @ddalle: v1.0

  • 2023-02-21 @ddalle: v2.0; DVC -> LFC

dvc_push(frel, **kw)

Push a DVC file

Call:
>>> ierr = ast.dvc_push(frel, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to MODULE_DIR

Outputs:
ierr: int

Return code

  • 0: success

  • 256: no DVC file

  • 512: not a git repo

Versions:
  • 2021-09-15 @ddalle: v1.0

dvc_status(frel, **kw)

Check status a DVC file

Call:
>>> ierr = ast.dvc_status(frel, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to MODULE_DIR

Outputs:
ierr: int

Return code

  • 0: success

  • 1: out-of-date

  • 256: no DVC file

  • 512: not a git repo

Versions:
  • 2021-09-23 @ddalle: v1.0

genr8_db_name(modname: str | None = None) str

Get database name based on first matching regular expression

This utilizes the following parameters:

  • MODULE_NAME

  • MODULE_NAME_REGEX_LIST

  • MODULE_NAME_REGEX_GROUPS

  • DB_NAME_TEMPLATE_LIST

Call:
>>> dbname = ast.genr8_db_name(modname=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

modname: {None} | str

Name of module to parse (default: MODULE_NAME)

Outputs:
dbname: str

Prescribed datakit name

Versions:
  • 2021-06-28 @ddalle: v1.0

  • 2021-07-15 @ddalle: v1.1; add modname arg

genr8_dvc_filename(fname: str) str

Produce name of large file stub

Call:
>>> flfc = repo.genr8_dvc_filename(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file, either original file or metadata stub

ext: {None} | ".dvc" | ".lfc"

Large file metadata stub file extension

Outputs:
flfc: str

Name of large file metadata stub file

Versions:
  • 2022-12-21 @ddalle: v1.0

genr8_dvc_ofilename(fname: str) str

Produce name of original large file

This strips the .dvc extension if necessary.

Call:
>>> forig = repo.genr8_lfc_ofilename(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file, either original file or metadata stub

Outputs:
forig: str

Name of original large file w/o LFC extension

Versions:
  • 2022-12-21 @ddalle: v1.0

genr8_lfc_filename(fname: str) str

Produce name of large file stub

Call:
>>> flfc = repo.genr8_lfc_filename(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file, either original file or metadata stub

ext: {None} | ".dvc" | ".lfc"

Large file metadata stub file extension

Outputs:
flfc: str

Name of large file metadata stub file

Versions:
  • 2022-12-21 @ddalle: v1.0

genr8_lfc_ofilename(fname: str) str

Produce name of original large file

This strips the .lfc or .dvc extension if necessary.

Call:
>>> forig = repo.genr8_lfc_ofilename(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file, either original file or metadata stub

Outputs:
forig: str

Name of original large file w/o LFC extension

Versions:
  • 2022-12-21 @ddalle: v1.0

genr8_modnames(dbname: str | None = None) list

Import first available module based on a DB name

This utilizes the following parameters:

  • DB_NAME

  • DB_NAME_REGEX_LIST

  • DB_NAME_REGEX_GROUPS

  • MODULE_NAME_TEMPLATE_LIST

Call:
>>> modnames = ast.genr8_modnames(dbname=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

dbame: {None} | str

Database name parse (default: DB_NAME)

Outputs:
modnames: list[str]

Candidate module names

Versions:
  • 2021-10-22 @ddalle: v1.0

get_abspath(frel: str) str

Get the full filename from path relative to MODULE_DIR

Call:
>>> fabs = ast.get_abspath(frel)
>>> fabs = ast.get_abspath(fabs)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to MODULE_DIR

fabs: str

Existing absolute path

Keys:
  • MODULE_DIR

Outputs:
fabs: str

Absolute path to file

Versions:
  • 2021-07-05 @ddalle: v1.0

get_db_filenames_by_type(ext: str) list

Get list of file names for a given data file type

Call:
>>> fnames = ast.get_db_filenames_by_type(ext)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

ext: str

File extension type

Outputs:
fnames: list[str]

List of datakit file names; one for each suffix

Versions:
  • 2021-07-01 @ddalle: v1.0

get_db_suffixes_by_type(ext: str) list

Get list of suffixes for given data file type

Call:
>>> suffixes = ast.get_db_suffixes_by_type(ext)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

ext: str

File extension type

Keys:
  • DB_SUFFIXES_BY_TYPE

Outputs:
suffixes: list[str | None]

List of additional suffixes (if any) for ext type

Versions:
  • 2021-07-01 @ddalle: v1.0

get_dbdir(ext)

Get containing folder for specified datakit file type

Call:
>>> fdir = ast.get_dbdir(ext)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

ext: str

File type

Outputs:
fdir: str

Absolute folder to ext datakit folder

Keys:
  • MODULE_DIR

  • DB_DIR

  • DB_DIRS_BY_TYPE

See Also:
Versions:
  • 2021-07-07 @ddalle: v1.0

get_dbdir_by_type(ext: str) str

Get datakit directory for given file type

Call:
>>> ast.get_db_dir_by_type(ext)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

ext: str

File extension type

Keys:
  • MODULE_DIR

  • DB_DIR

  • DB_DIRS_BY_TYPE

Outputs:
fdir: str

Absolute path to ext datakit folder

Versions:
  • 2021-06-29 @ddalle: v1.0

get_dbfile(fname, ext)

Get a file name relative to the datakit folder

Call:
>>> fabs = ast.get_dbfile(fname, ext)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: None | str

Name of file relative to DB_DIRS_BY_TYPE for ext

ext: str

File type

Outputs:
fabs: str

Absolute path to file

Keys:
  • MODULE_DIR

  • DB_DIR

  • DB_DIRS_BY_TYPE

Versions:
  • 2021-07-07 @ddalle: v1.0

get_dbfiles(dbname: str, ext: str) list

Get list of datakit filenames for specified type

Call:
>>> fnames = ast.get_dbfiles(dbname, ext)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

dbname: None | str

Database name (default if None)

ext: str

File type

Outputs:
fnames: list[str]

Absolute path to files for datakit

Keys:
  • MODULE_DIR

  • DB_DIR

  • DB_DIRS_BY_TYPE

  • DB_SUFFIXES_BY_TYPE

Versions:
  • 2021-07-07 @ddalle: v1.0

get_meta_jsonfile() str

Get absolute path to module’s metadata file

Call:
>>> fname = ast.get_meta_jsonfile()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
fname: str

Absolute path to meta.json, if used

Keys:
  • MODULE_DIR

Versions:
  • 2025-06-13 @ddalle: v1.0

get_rawdata_opt(opt: str, remote: str = 'origin', vdef=None)

Get a rawdata/datakit-sources.json setting

Call:
>>> v = ast.get_rawdata_opt(opt, remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

opt: str

Name of option to read

remote: {"origin"} | str

Name of remote from which to read opt

vdef: {None} | any

Default value if opt not present

Outputs:
v: {vdef} | any

Value from JSON file if possible, else vdef

Versions:
  • 2021-09-01 @ddalle: v1.0

  • 2022-01-26 @ddalle: Version 1.1; add substitutions

get_rawdata_ref(remote: str = 'origin') str

Get optional SHA-1 hash, tag, or branch for raw data source

Call:
>>> ref = ast.get_rawdata_ref(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Outputs:
ref: {"HEAD"} | str

Valid git reference name

Versions:
  • 2021-09-01 @ddalle: v1.0

get_rawdata_remotelist()

Get list of remotes from rawdata/datakit-sources.json

Call:
>>> remotes = ast.get_rawdata_remotelist()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
remotes: list[str]

List of remotes

Versions:
  • 2021-09-02 @ddalle: v1.0

get_rawdata_sourcecommit(remote: str = 'origin') str

Get the latest used SHA-1 hash for a remote

Call:
>>> sha1 = ast.get_rawdata_sourcecommit(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote from which to read opt

Outputs:
sha1: None | str

40-character SHA-1 hash if possible from datakit-sources-commit.json

Versions:
  • 2021-09-02 @ddalle: v1.0

get_rawdatadir()

Get absolute path to module’s raw data folder

Call:
>>> fdir = ast.get_rawdatadir()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
fdir: str

Absolute path to raw data folder

Keys:
  • MODULE_DIR

  • RAWDATA_DIR

Versions:
  • 2021-07-08 @ddalle: v1.0

get_rawdatafilename(fname, dvc=False)

Get a file name relative to the datakit folder

Call:
>>> fabs = ast.get_rawdatafilename(fname, dvc=False)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: None | str

Name of file relative to DB_DIRS_BY_TYPE for ext

dvc: True | {False}

Option to pull DVC file where fabs doesn’t exist

Outputs:
fabs: str

Absolute path to raw data file

Keys:
  • MODULE_DIR

  • RAWDATA_DIR

Versions:
  • 2021-07-07 @ddalle: v1.0

get_rawdataremote_git(remote: str = 'origin', f: bool = False)

Get full URL and SHA-1 hash for raw data source repo

Call:
>>> url, sha1 = ast.get_rawdataremote_git(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

f: True | {False}

Option to override dkl.rawdata_remotes if present

Outputs:
url: None | str

Full path to valid git repo, if possible

sha1: None | str

40-character hash of specified commit, if possible

Versions:
  • 2021-09-01 @ddalle: v1.0

get_rawdataremote_gitfiles(remote: str = 'origin') list

List all files in candidate raw data remote source

Call:
>>> fnames = ast.get_rawdataremote_gitfiles(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Outputs:
fnames: list[str]

List of files to be copied from remote repo

Versions:
  • 2021-09-01 @ddalle: v1.0

get_rawdataremote_rsync(remote: str = 'origin') str

Get full URL for rsync raw data source repo

If several options are present, this function checks for the first with an extant folder.

Call:
>>> url = ast.get_rawdataremote_rsync(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Outputs:
url: None | str

Full path to valid git repo, if possible

Versions:
  • 2021-09-02 @ddalle: v1.0

get_rawdataremote_rsyncfiles(remote: str = 'origin') list

List all files in candidate remote folder

Call:
>>> fnames = ast.get_rawdataremote_rsyncfiles(remote)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Outputs:
fnames: list[str]

List of files to be copied from remote repo

Versions:
  • 2021-09-02 @ddalle: v1.0

get_requirement(j: int = 0) str

Get numbered requirement, from file or local variable

Call:
>>> reqs = ast.get_requirement(j=0)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

j: {0} | int

Index of requirement to process

Outputs:
req: str

Name for requirement j

Versions:
  • 2025-06-13 @ddalle: v1.0

get_requirements() list

Get list of requirements, from file or local variable

Call:
>>> reqs = ast.get_requirements()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
reqs: list[str]

List of required databse names

Versions:
  • 2025-06-13 @ddalle: v1.0

get_requirements_json() list | None

Read list of requirements from JSON file, if applicable

Call:
>>> reqs = ast.get_requirements_json()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
reqs: list[str] | None

Requirements read from file, if any

Versions:
  • 2025-06-13 @ddalle: v1.0

get_requirements_jsonfile() str

Get absolute path to module’s requirements.json file

Call:
>>> fname = ast.get_requirements_jsonfile()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
fname: str

Absolute path to requirements.json, if used

Keys:
  • MODULE_DIR

Versions:
  • 2025-06-13 @ddalle: v1.0

get_rootdir() str

Get path to folder containing top-level module

Call:
>>> rootdir = ast.get_rootdir()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
rootdir: str

Absolute path to folder containing top-level module

Versions:
  • 2025-06-13 @ddalle: v1.0

import_db_name(dbname: str | None = None)

Import first available module based on a DB name

This utilizes the following parameters:

  • DB_NAME

  • DB_NAME_REGEX_LIST

  • DB_NAME_REGEX_GROUPS

  • MODULE_NAME_TEMPLATE_LIST

Call:
>>> mod = ast.import_db_name(dbname=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

dbame: {None} | str

Database name parse (default: DB_NAME)

Outputs:
mod: module

Module with DB_NAME equal to dbname

Versions:
  • 2021-07-15 @ddalle: v1.0

import_requirement(j: int = 0)

Import module from numbered requirement

Call:
>>> mod = ast.import_requirement(j=0)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

j: {0} | int

Index of requirement to process

Outputs:
mod: module

Module from requirement j

Versions:
  • 2025-06-13 @ddalle: v1.0

list_rawdataremote_git(remote: str = 'origin') list

List all files in candidate raw data remote source

Call:
>>> ls_files = ast.list_rawdataremote_git(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Outputs:
ls_files: list[str]

List of all files tracked by remote repo

Versions:
  • 2021-09-01 @ddalle: v1.0

list_rawdataremote_rsync(remote: str = 'origin') list

List all files in candidate raw data remote folder

Call:
>>> ls_files = ast.list_rawdataremote_rsync(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Outputs:
ls_files: list[str]

List of all files in remote source folder

Versions:
  • 2021-09-02 @ddalle: v1.0

make_db_name()

Retrieve or create database name from module name

This utilizes the following parameters:

  • MODULE_NAME_REGEX_LIST

  • MODULE_NAME_REGEX_GROUPS

  • DB_NAME_TEMPLATE_LIST

Call:
>>> dbname = ast.make_db_name()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Outputs:
dbname: str

Prescribed datakit name

Versions:
  • 2021-06-28 @ddalle: v1.0

make_rawdatadir()

Ensure raw data folder exists

Call:
>>> ast.make_rawdatadir()
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

Keys:
  • MODULE_DIR

  • RAWDATA_DIR

Versions:
  • 2025-05-02 @ddalle: v1.0

prep_dirs(frel: str)

Prepare folders needed for file if needed

Any folders in frel that don’t exist will be created. For example "db/csv/datakit.csv" will create the folders db/ and db/csv/ if they don’t already exist.

Call:
>>> ast.prep_dirs(frel)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to MODULE_DIR

Keys:
  • MODULE_DIR

See also:
Versions:
  • 2021-07-07 @ddalle: v1.0

prep_dirs_rawdata(frel: str)

Prepare folders relative to rawdata/ folder

Call:
>>> ast.prep_dirs_rawdata(frel)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

frel: str

Name of file relative to rawdata/ folder

fabs: str

Existing absolute path

Keys:
  • MODULE_DIR

See also:
Versions:
  • 2021-09-01 @ddalle: v1.0

read_db_cdb(cls: type | None = None, **kw) DataKit

Read a datakit using .cdb file type

Call:
>>> db = ast.read_db_mat(fname, cls=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-07-03 @ddalle: v1.0

read_db_csv(cls: type | None = None, **kw) DataKit

Read a datakit using .csv file type

Call:
>>> db = ast.read_db_csv(fname, cls=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-07-03 @ddalle: v1.0

read_db_mat(cls: type | None = None, **kw) DataKit

Read a datakit using .mat file type

Call:
>>> db = ast.read_db_mat(fname, cls=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-07-03 @ddalle: v1.0

read_db_name(dbname: str | None = None) DataKit

Read datakit from first available module based on a DB name

This utilizes the following parameters:

  • DB_NAME

  • DB_NAME_REGEX_LIST

  • DB_NAME_REGEX_GROUPS

  • MODULE_NAME_TEMPLATE_LIST

Call:
>>> db = ast.read_db_name(dbname=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

dbame: {None} | str

Database name parse (default: DB_NAME)

Outputs:
db: DataKit

Output of read_db() from module with DB_NAME equal to dbname

Versions:
  • 2021-09-10 @ddalle: v1.0

read_dbfile(fname: str, ext: str, **kw) DataKit

Read a databook file from DB_DIR

Call:
>>> db = ast.read_dbfile_mat(self, ext, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: None | str

Name of file to read from raw data folder

ext: str

Database file type

ftype: {"mat"} | None | str

Optional specifier to predetermine file type

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

Keys:
  • MODULE_DIR

  • DB_DIR

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-06-25 @ddalle: v1.0

read_dbfile_cdb(fname: str, **kw) DataKit

Read a .cdb file from DB_DIR

Call:
>>> db = ast.read_dbfile_cdb(fname, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file to read from raw data folder

ftype: {"mat"} | None | str

Optional specifier to predetermine file type

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

kw: dict

Additional keyword arguments passed to cls

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-06-25 @ddalle: v1.0

read_dbfile_csv(fname: str, **kw) DataKit

Read a .mat file from DB_DIR

Call:
>>> db = ast.read_dbfile_mat(fname, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file to read from raw data folder

ftype: {"mat"} | None | str

Optional specifier to predetermine file type

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

kw: dict

Additional keyword arguments passed to cls

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-06-25 @ddalle: v1.0

read_dbfile_csv_rbf(fname: str, **kw) DataKit

Read a .mat file from DB_DIR

Call:
>>> db = ast.read_dbfile_mat(fname, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file to read from raw data folder

ftype: {"mat"} | None | str

Optional specifier to predetermine file type

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

kw: dict

Additional keyword arguments passed to cls

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-06-25 @ddalle: v1.0

read_dbfile_mat(fname: str, **kw) DataKit

Read a .mat file from DB_DIR

Call:
>>> db = ast.read_dbfile_mat(fname, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file to read from raw data folder

ftype: {"mat"} | None | str

Optional specifier to predetermine file type

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

kw: dict

Additional keyword arguments passed to cls

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

Versions:
  • 2021-06-25 @ddalle: v1.0

read_rawdata_json(fname: str = 'datakit-sources.json', f: bool = False)

Read datakit-sources.json from package’s raw data folder

Call:
>>> ast.read_rawdata_json(fname=None, f=False)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: {"datakit-sources.json"} | str

Relative or absolute file name (rel. to rawdata/)

f: True | {False}

Reread even if dkl.rawdata_sources is nonempty

Effects:
dkl.rawdata_sources: dict

Settings read from JSON file

Versions:
  • 2021-09-01 @ddalle: v1.0

read_rawdatafile(fname: str, ftype: str | None = None, cls: type | None = None, **kw) DataKit

Read a file from the RAW_DATA folder

Call:
>>> db = ast.read_rawdatafile(fname, ftype=None, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fname: str

Name of file to read from raw data folder

ftype: {None} | str

Optional specifier to predetermine file type

cls: {None} | type

Class to read fname other than dkl[“DATAKIT_CLS”]

kw: dict

Additional keyword arguments passed to cls

Outputs:
db: dkl[“DATAKIT_CLS”] | cls

DataKit instance read from fname

See Also:
Versions:
read_requirement(j: int = 0) DataKit

Read database from numbered requirement

Call:
>>> db = ast.read_requirement(j=0)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

j: {0} | int

Index of requirement to process

Outputs:
db: DataKit

Database from requirement j

Versions:
  • 2025-06-13 @ddalle: v1.0

resolve_dbname(dbname: str) str

Resolve a database name using current DKit’s prefixes

Suppose the current database name is "CAPE-DB-T-F3D-001".

Examples:
>>> ast.resolve_dbname("002")
"CAPE-DB-T-F3D-002"
>>> ast.resolve_dbname("RMX-001")
"CAPE-DB-T-RMX-001"
>>> ast.resolve_dbname("Q-C3D-102")
"CAPE-DB-Q-C3D-102"
Call:
>>> fulldbname = ast.resolve_dbname(dbname)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

dbname: str

(Partial) database name

Outputs:
fulldbname: str

Full database name, using groups from ast.DB_NAME

Versions:
  • 2025-06-13 @ddalle: v1.0

update_rawdata(**kw)

Update raw data using rawdata/datakit-sources.json

The settings for zero or more “remotes” are read from that JSON file in the package’s rawdata/ folder. Example contents of such a file are shown below:

{
    "hub": [
        "/nobackup/user/",
        "pfe:/nobackupp16/user/git",
        "linux252:/nobackup/user/git"
    ],
    "remotes": {
        "origin": {
            "url": "data/datarepo.git",
            "type": "git-show",
            "glob": "aero_STACK*.csv",
            "regex": [
                "aero_CORE_no_[a-z]+\.csv",
                "aero_LSRB_no_[a-z]+\.csv",
                "aero_RSRB_no_[a-z]+\.csv"
            ],
            "commit": null,
            "branch": "main",
            "tag": null,
            "destination": "."
        }
    }
}
Call:
>>> ast.update_rawdata(remote=None, remotes=None)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {None} | str

Name of single remote to update

remotes: {None} | list[str]

Name of multiple remotes to update

Versions:
  • 2021-09-02 @ddalle: v1.0

  • 2022-01-18 @ddalle: Version 1.1; remote(s) kwarg

update_rawdata_remote(remote: str = 'origin')

Update raw data for one remote

Call:
>>> ast.update_rawdata_remote(remote="origin")
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

remote: {"origin"} | str

Name of remote

Versions:
  • 2021-09-02 @ddalle: v1.0

write_db_cdb(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write (all) canonical CAPE databse (.cdb) file(s)

Call:
>>> db = ast.write_db_cdb(readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

readfunc: {None} | callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

cols: {None} | list

If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2021-09-10 @ddalle: v1.0

write_db_csv(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write (all) canonical db CSV file(s)

Call:
>>> db = ast.write_db_csv(readfunc=None, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

readfunc: {None} | callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

cols: {None} | list

If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2021-09-10 @ddalle: v1.0

write_db_mat(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write (all) canonical db MAT file(s)

Call:
>>> db = ast.write_db_mat(readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

readfunc: {None} | callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

cols: {None} | list

If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2021-09-10 @ddalle: v1.0

write_db_xlsx(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write (all) canonical db XLSX file(s)

Call:
>>> db = ast.write_db_xlsx(readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

readfunc: {None} | callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

cols: {None} | list

If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2022-12-14 @ddalle: v1.0

write_dbfile_cdb(fcdb: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write a canonical db CDB (CAPE data binary) file

Call:
>>> db = ast.write_dbfile_cdb(fcdb, readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fcdb: str

Name of file to write

readfunc: callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2021-09-10 @ddalle: v1.0

  • 2021-09-15 @ddalle: v1.1; check for DVC stub

  • 2021-09-15 @ddalle: v1.2; add dvc option

write_dbfile_csv(fcsv: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write a canonical db CSV file

Call:
>>> db = ast.write_dbfile_csv(fcsv, readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fscv: str

Name of file to write

readfunc: callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2021-09-10 @ddalle: v1.0

  • 2021-09-15 @ddalle: v1.1; check for DVC stub

  • 2021-09-15 @ddalle: v1.2; add dvc option

write_dbfile_mat(fmat: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write a canonical db MAT file

Call:
>>> db = ast.write_dbfile_mat(fmat, readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fmat: str

Name of file to write

readfunc: callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2021-09-10 @ddalle: v1.0

  • 2021-09-15 @ddalle: v1.1; check for DVC stub

  • 2021-09-15 @ddalle: v1.2; add dvc option

write_dbfile_xlsx(fxls: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) DataKit | None

Write a canonical db XLSX file

Call:
>>> db = ast.write_dbfile_xlsx(fmat, readfunc, f=True, **kw)
Inputs:
ast: DataKitAssistant

Tool for reading datakits for a specific module

fxlsx: str

Name of file to write

readfunc: callable

Function to read source datakit if needed

f: {True} | False

Overwrite fmat if it exists

db: {None} | DataKit

Existing source datakit to write

dvc: True | {False}

Option to add and push data file using dvc

Outputs:
db: None | DataKit

If source datakit is read during execution, return it to be used in other write functions

Versions:
  • 2022-12-14 @ddalle: v1.0