`cape.dkit.datakitloader`: DataKit collection tools¶

This class provides the DataKitLoader, which takes as input the module __name__ and __file__ to automatically determine a variety of DataKit parameters.

class cape.dkit.datakitloader.DataKitLoader(name: str | None = None, fname: str | None = None, DATAKIT_CLS: type | None = None, **kw)¶

Tool for reading datakits based on module name and file

Call:

>>> dkl = DataKitLoader(name, fname, **kw)

Inputs:

name: str: Module name, from __name__
fname: str: Absolute path to module file name, from __file__

Outputs:

dkl: DataKitLoader: Tool for reading datakits for a specific module

check_dvcfile(fname: str, f: bool = False) → bool¶

Check if a file exists with appended``.dvc`` extension

Call:

>>> q = ast.check_dvcfile(fname)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file [optionally relative to MODULE_DIR]

Keys:

MODULE_DIR

Outputs:

q: True | False: Whether or not fname or DVC file exists

Versions:

2021-07-19 @ddalle: v1.0

check_file(fname: str, f: bool = False, dvc: bool = True)¶

Check if a file exists OR a .dvc version

If f is True, this returns False always
If fabs exists, this returns True
If fabs plus .dvc exists, it also returns True

Call:

>>> q = ast.check_file(fname, f=False, dvc=True)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file [optionally relative to MODULE_DIR]
f: True | {False}: Force-overwrite option; always returns False
dvc: {True} | False: Option to check for .dvc extension

Keys:

MODULE_DIR

Outputs:

q: True | False: Whether or not fname or DVC file exists

Versions:

2021-07-19 @ddalle: v1.0

check_modfile(fname: str) → bool¶

Check if a file exists OR a .dvc version

Call:

>>> q = ast.check_modfile(fname)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file [optionally relative to MODULE_DIR]

Keys:

MODULE_DIR

Outputs:

q: True | False: Whether or not fname or DVC file exists

Versions:

2021-07-19 @ddalle: v1.0

create_db_name()¶

Create and save database name from module name

This utilizes the following parameters:

MODULE_NAME_REGEX_LIST
MODULE_NAME_REGEX_GROUPS
DB_NAME_TEMPLATE_LIST

Call:

>>> dbname = ast.create_db_name()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

dbname: str: Prescribed datakit name

Versions:

2021-06-28 @ddalle: v1.0

dvc_add(frel, **kw)¶

Add (cache) a file using DVC

Call:

>>> ierr = ast.dvc_add(frel, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to MODULE_DIR

Outputs:

ierr: int

Return code

0: success
512: not a git repo

Versions:

2021-09-15 @ddalle: v1.0

dvc_pull(frel, **kw)¶

Pull a DVC file

Call:

>>> ierr = ast.dvc_pull(frel, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to MODULE_DIR

Outputs:

ierr: int

Return code

0: success
256: no DVC file
512: not a git repo

Versions:

2021-07-19 @ddalle: v1.0
2023-02-21 @ddalle: v2.0; DVC -> LFC

dvc_push(frel, **kw)¶

Push a DVC file

Call:

>>> ierr = ast.dvc_push(frel, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to MODULE_DIR

Outputs:

ierr: int

Return code

0: success
256: no DVC file
512: not a git repo

Versions:

2021-09-15 @ddalle: v1.0

dvc_status(frel, **kw)¶

Check status a DVC file

Call:

>>> ierr = ast.dvc_status(frel, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to MODULE_DIR

Outputs:

ierr: int

Return code

0: success
1: out-of-date
256: no DVC file
512: not a git repo

Versions:

2021-09-23 @ddalle: v1.0

genr8_db_name(modname: str | None = None) → str¶

Get database name based on first matching regular expression

This utilizes the following parameters:

MODULE_NAME
MODULE_NAME_REGEX_LIST
MODULE_NAME_REGEX_GROUPS
DB_NAME_TEMPLATE_LIST

Call:

>>> dbname = ast.genr8_db_name(modname=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
modname: {None} | str: Name of module to parse (default: MODULE_NAME)

Outputs:

dbname: str: Prescribed datakit name

Versions:

2021-06-28 @ddalle: v1.0
2021-07-15 @ddalle: v1.1; add modname arg

genr8_dvc_filename(fname: str) → str¶

Produce name of large file stub

Call:

>>> flfc = repo.genr8_dvc_filename(fname)

Inputs:

repo: GitRepo: Interface to git repository
fname: str: Name of file, either original file or metadata stub
ext: {None} | ".dvc" | ".lfc": Large file metadata stub file extension

Outputs:

flfc: str: Name of large file metadata stub file

Versions:

2022-12-21 @ddalle: v1.0

genr8_dvc_ofilename(fname: str) → str¶

Produce name of original large file

This strips the .dvc extension if necessary.

Call:

>>> forig = repo.genr8_lfc_ofilename(fname)

Inputs:

repo: GitRepo: Interface to git repository
fname: str: Name of file, either original file or metadata stub

Outputs:

forig: str: Name of original large file w/o LFC extension

Versions:

2022-12-21 @ddalle: v1.0

genr8_lfc_filename(fname: str) → str¶

Produce name of large file stub

Call:

>>> flfc = repo.genr8_lfc_filename(fname)

Inputs:

repo: GitRepo: Interface to git repository
fname: str: Name of file, either original file or metadata stub
ext: {None} | ".dvc" | ".lfc": Large file metadata stub file extension

Outputs:

flfc: str: Name of large file metadata stub file

Versions:

2022-12-21 @ddalle: v1.0

genr8_lfc_ofilename(fname: str) → str¶

Produce name of original large file

This strips the .lfc or .dvc extension if necessary.

Call:

>>> forig = repo.genr8_lfc_ofilename(fname)

Inputs:

repo: GitRepo: Interface to git repository
fname: str: Name of file, either original file or metadata stub

Outputs:

forig: str: Name of original large file w/o LFC extension

Versions:

2022-12-21 @ddalle: v1.0

genr8_modnames(dbname: str | None = None) → list¶

Import first available module based on a DB name

This utilizes the following parameters:

DB_NAME
DB_NAME_REGEX_LIST
DB_NAME_REGEX_GROUPS
MODULE_NAME_TEMPLATE_LIST

Call:

>>> modnames = ast.genr8_modnames(dbname=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
dbame: {None} | str: Database name parse (default: DB_NAME)

Outputs:

modnames: list[str]: Candidate module names

Versions:

2021-10-22 @ddalle: v1.0

get_abspath(frel: str) → str¶

Get the full filename from path relative to MODULE_DIR

Call:

>>> fabs = ast.get_abspath(frel)
>>> fabs = ast.get_abspath(fabs)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to MODULE_DIR
fabs: str: Existing absolute path

Keys:

MODULE_DIR

Outputs:

fabs: str: Absolute path to file

Versions:

2021-07-05 @ddalle: v1.0

get_db_filenames_by_type(ext: str) → list¶

Get list of file names for a given data file type

Call:

>>> fnames = ast.get_db_filenames_by_type(ext)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
ext: str: File extension type

Outputs:

fnames: list[str]: List of datakit file names; one for each suffix

Versions:

2021-07-01 @ddalle: v1.0

get_db_suffixes_by_type(ext: str) → list¶

Get list of suffixes for given data file type

Call:

>>> suffixes = ast.get_db_suffixes_by_type(ext)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
ext: str: File extension type

Keys:

DB_SUFFIXES_BY_TYPE

Outputs:

suffixes: list[str | None]: List of additional suffixes (if any) for ext type

Versions:

2021-07-01 @ddalle: v1.0

get_dbdir(ext)¶

Get containing folder for specified datakit file type

Call:

>>> fdir = ast.get_dbdir(ext)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
ext: str: File type

Outputs:

fdir: str: Absolute folder to ext datakit folder

Keys:

MODULE_DIR
DB_DIR
DB_DIRS_BY_TYPE

See Also:

get_dbdir_by_type()

Versions:

2021-07-07 @ddalle: v1.0

get_dbdir_by_type(ext: str) → str¶

Get datakit directory for given file type

Call:

>>> ast.get_db_dir_by_type(ext)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
ext: str: File extension type

Keys:

MODULE_DIR
DB_DIR
DB_DIRS_BY_TYPE

Outputs:

fdir: str: Absolute path to ext datakit folder

Versions:

2021-06-29 @ddalle: v1.0

get_dbfile(fname, ext)¶

Get a file name relative to the datakit folder

Call:

>>> fabs = ast.get_dbfile(fname, ext)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: None | str: Name of file relative to DB_DIRS_BY_TYPE for ext
ext: str: File type

Outputs:

fabs: str: Absolute path to file

Keys:

MODULE_DIR
DB_DIR
DB_DIRS_BY_TYPE

Versions:

2021-07-07 @ddalle: v1.0

get_dbfiles(dbname: str, ext: str) → list¶

Get list of datakit filenames for specified type

Call:

>>> fnames = ast.get_dbfiles(dbname, ext)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
dbname: None | str: Database name (default if None)
ext: str: File type

Outputs:

fnames: list[str]: Absolute path to files for datakit

Keys:

MODULE_DIR
DB_DIR
DB_DIRS_BY_TYPE
DB_SUFFIXES_BY_TYPE

Versions:

2021-07-07 @ddalle: v1.0

get_meta_jsonfile() → str¶

Get absolute path to module’s metadata file

Call:

>>> fname = ast.get_meta_jsonfile()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

fname: str: Absolute path to meta.json, if used

Keys:

MODULE_DIR

Versions:

2025-06-13 @ddalle: v1.0

get_rawdata_opt(opt: str, remote: str = 'origin', vdef=None)¶

Get a rawdata/datakit-sources.json setting

Call:

>>> v = ast.get_rawdata_opt(opt, remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
opt: str: Name of option to read
remote: {"origin"} | str: Name of remote from which to read opt
vdef: {None} | any: Default value if opt not present

Outputs:

v: {vdef} | any: Value from JSON file if possible, else vdef

Versions:

2021-09-01 @ddalle: v1.0
2022-01-26 @ddalle: Version 1.1; add substitutions

get_rawdata_ref(remote: str = 'origin') → str¶

Get optional SHA-1 hash, tag, or branch for raw data source

Call:

>>> ref = ast.get_rawdata_ref(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Outputs:

ref: {"HEAD"} | str: Valid git reference name

Versions:

2021-09-01 @ddalle: v1.0

get_rawdata_remotelist()¶

Get list of remotes from rawdata/datakit-sources.json

Call:

>>> remotes = ast.get_rawdata_remotelist()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

remotes: list[str]: List of remotes

Versions:

2021-09-02 @ddalle: v1.0

get_rawdata_sourcecommit(remote: str = 'origin') → str¶

Get the latest used SHA-1 hash for a remote

Call:

>>> sha1 = ast.get_rawdata_sourcecommit(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote from which to read opt

Outputs:

sha1: None | str: 40-character SHA-1 hash if possible from datakit-sources-commit.json

Versions:

2021-09-02 @ddalle: v1.0

get_rawdatadir()¶

Get absolute path to module’s raw data folder

Call:

>>> fdir = ast.get_rawdatadir()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

fdir: str: Absolute path to raw data folder

Keys:

MODULE_DIR
RAWDATA_DIR

Versions:

2021-07-08 @ddalle: v1.0

get_rawdatafilename(fname, dvc=False)¶

Get a file name relative to the datakit folder

Call:

>>> fabs = ast.get_rawdatafilename(fname, dvc=False)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: None | str: Name of file relative to DB_DIRS_BY_TYPE for ext
dvc: True | {False}: Option to pull DVC file where fabs doesn’t exist

Outputs:

fabs: str: Absolute path to raw data file

Keys:

MODULE_DIR
RAWDATA_DIR

Versions:

2021-07-07 @ddalle: v1.0

get_rawdataremote_git(remote: str = 'origin', f: bool = False)¶

Get full URL and SHA-1 hash for raw data source repo

Call:

>>> url, sha1 = ast.get_rawdataremote_git(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote
f: True | {False}: Option to override dkl.rawdata_remotes if present

Outputs:

url: None | str: Full path to valid git repo, if possible
sha1: None | str: 40-character hash of specified commit, if possible

Versions:

2021-09-01 @ddalle: v1.0

get_rawdataremote_gitfiles(remote: str = 'origin') → list¶

List all files in candidate raw data remote source

Call:

>>> fnames = ast.get_rawdataremote_gitfiles(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Outputs:

fnames: list[str]: List of files to be copied from remote repo

Versions:

2021-09-01 @ddalle: v1.0

get_rawdataremote_rsync(remote: str = 'origin') → str¶

Get full URL for rsync raw data source repo

If several options are present, this function checks for the first with an extant folder.

Call:

>>> url = ast.get_rawdataremote_rsync(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Outputs:

url: None | str: Full path to valid git repo, if possible

Versions:

2021-09-02 @ddalle: v1.0

get_rawdataremote_rsyncfiles(remote: str = 'origin') → list¶

List all files in candidate remote folder

Call:

>>> fnames = ast.get_rawdataremote_rsyncfiles(remote)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Outputs:

fnames: list[str]: List of files to be copied from remote repo

Versions:

2021-09-02 @ddalle: v1.0

get_requirement(j: int = 0) → str¶

Get numbered requirement, from file or local variable

Call:

>>> reqs = ast.get_requirement(j=0)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
j: {0} | int: Index of requirement to process

Outputs:

req: str: Name for requirement j

Versions:

2025-06-13 @ddalle: v1.0

get_requirements() → list¶

Get list of requirements, from file or local variable

Call:

>>> reqs = ast.get_requirements()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

reqs: list[str]: List of required databse names

Versions:

2025-06-13 @ddalle: v1.0

get_requirements_json() → list | None¶

Read list of requirements from JSON file, if applicable

Call:

>>> reqs = ast.get_requirements_json()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

reqs: list[str] | None: Requirements read from file, if any

Versions:

2025-06-13 @ddalle: v1.0

get_requirements_jsonfile() → str¶

Get absolute path to module’s requirements.json file

Call:

>>> fname = ast.get_requirements_jsonfile()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

fname: str: Absolute path to requirements.json, if used

Keys:

MODULE_DIR

Versions:

2025-06-13 @ddalle: v1.0

get_rootdir() → str¶

Get path to folder containing top-level module

Call:

>>> rootdir = ast.get_rootdir()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

rootdir: str: Absolute path to folder containing top-level module

Versions:

2025-06-13 @ddalle: v1.0

import_db_name(dbname: str | None = None)¶

Import first available module based on a DB name

This utilizes the following parameters:

DB_NAME
DB_NAME_REGEX_LIST
DB_NAME_REGEX_GROUPS
MODULE_NAME_TEMPLATE_LIST

Call:

>>> mod = ast.import_db_name(dbname=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
dbame: {None} | str: Database name parse (default: DB_NAME)

Outputs:

mod: module: Module with DB_NAME equal to dbname

Versions:

2021-07-15 @ddalle: v1.0

import_requirement(j: int = 0)¶

Import module from numbered requirement

Call:

>>> mod = ast.import_requirement(j=0)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
j: {0} | int: Index of requirement to process

Outputs:

mod: module: Module from requirement j

Versions:

2025-06-13 @ddalle: v1.0

list_rawdataremote_git(remote: str = 'origin') → list¶

List all files in candidate raw data remote source

Call:

>>> ls_files = ast.list_rawdataremote_git(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Outputs:

ls_files: list[str]: List of all files tracked by remote repo

Versions:

2021-09-01 @ddalle: v1.0

list_rawdataremote_rsync(remote: str = 'origin') → list¶

List all files in candidate raw data remote folder

Call:

>>> ls_files = ast.list_rawdataremote_rsync(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Outputs:

ls_files: list[str]: List of all files in remote source folder

Versions:

2021-09-02 @ddalle: v1.0

make_db_name()¶

Retrieve or create database name from module name

This utilizes the following parameters:

MODULE_NAME_REGEX_LIST
MODULE_NAME_REGEX_GROUPS
DB_NAME_TEMPLATE_LIST

Call:

>>> dbname = ast.make_db_name()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Outputs:

dbname: str: Prescribed datakit name

Versions:

2021-06-28 @ddalle: v1.0

make_rawdatadir()¶

Ensure raw data folder exists

Call:

>>> ast.make_rawdatadir()

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module

Keys:

MODULE_DIR
RAWDATA_DIR

Versions:

2025-05-02 @ddalle: v1.0

prep_dirs(frel: str)¶

Prepare folders needed for file if needed

Any folders in frel that don’t exist will be created. For example "db/csv/datakit.csv" will create the folders db/ and db/csv/ if they don’t already exist.

Call:

>>> ast.prep_dirs(frel)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to MODULE_DIR

Keys:

MODULE_DIR

See also:

DataKitLoader.get_abspath()

Versions:

2021-07-07 @ddalle: v1.0

prep_dirs_rawdata(frel: str)¶

Prepare folders relative to rawdata/ folder

Call:

>>> ast.prep_dirs_rawdata(frel)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
frel: str: Name of file relative to rawdata/ folder
fabs: str: Existing absolute path

Keys:

MODULE_DIR

See also:

DataKitLoader.prep_dirs()
DataKitLoader.get_abspath()

Versions:

2021-09-01 @ddalle: v1.0

read_db_cdb(cls: type | None = None, **kw) → DataKit¶

Read a datakit using .cdb file type

Call:

>>> db = ast.read_db_mat(fname, cls=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-07-03 @ddalle: v1.0

read_db_csv(cls: type | None = None, **kw) → DataKit¶

Read a datakit using .csv file type

Call:

>>> db = ast.read_db_csv(fname, cls=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-07-03 @ddalle: v1.0

read_db_mat(cls: type | None = None, **kw) → DataKit¶

Read a datakit using .mat file type

Call:

>>> db = ast.read_db_mat(fname, cls=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-07-03 @ddalle: v1.0

read_db_name(dbname: str | None = None) → DataKit¶

Read datakit from first available module based on a DB name

This utilizes the following parameters:

DB_NAME
DB_NAME_REGEX_LIST
DB_NAME_REGEX_GROUPS
MODULE_NAME_TEMPLATE_LIST

Call:

>>> db = ast.read_db_name(dbname=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
dbame: {None} | str: Database name parse (default: DB_NAME)

Outputs:

db: DataKit: Output of read_db() from module with DB_NAME equal to dbname

Versions:

2021-09-10 @ddalle: v1.0

read_dbfile(fname: str, ext: str, **kw) → DataKit¶

Read a databook file from DB_DIR

Call:

>>> db = ast.read_dbfile_mat(self, ext, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: None | str: Name of file to read from raw data folder
ext: str: Database file type
ftype: {"mat"} | None | str: Optional specifier to predetermine file type
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]

Keys:

MODULE_DIR
DB_DIR

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-06-25 @ddalle: v1.0

read_dbfile_cdb(fname: str, **kw) → DataKit¶

Read a .cdb file from DB_DIR

Call:

>>> db = ast.read_dbfile_cdb(fname, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file to read from raw data folder
ftype: {"mat"} | None | str: Optional specifier to predetermine file type
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]
kw: dict: Additional keyword arguments passed to cls

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-06-25 @ddalle: v1.0

read_dbfile_csv(fname: str, **kw) → DataKit¶

Read a .mat file from DB_DIR

Call:

>>> db = ast.read_dbfile_mat(fname, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file to read from raw data folder
ftype: {"mat"} | None | str: Optional specifier to predetermine file type
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]
kw: dict: Additional keyword arguments passed to cls

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-06-25 @ddalle: v1.0

read_dbfile_csv_rbf(fname: str, **kw) → DataKit¶

Read a .mat file from DB_DIR

Call:

>>> db = ast.read_dbfile_mat(fname, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file to read from raw data folder
ftype: {"mat"} | None | str: Optional specifier to predetermine file type
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]
kw: dict: Additional keyword arguments passed to cls

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-06-25 @ddalle: v1.0

read_dbfile_mat(fname: str, **kw) → DataKit¶

Read a .mat file from DB_DIR

Call:

>>> db = ast.read_dbfile_mat(fname, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file to read from raw data folder
ftype: {"mat"} | None | str: Optional specifier to predetermine file type
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]
kw: dict: Additional keyword arguments passed to cls

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

Versions:

2021-06-25 @ddalle: v1.0

read_rawdata_json(fname: str = 'datakit-sources.json', f: bool = False)¶

Read datakit-sources.json from package’s raw data folder

Call:

>>> ast.read_rawdata_json(fname=None, f=False)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: {"datakit-sources.json"} | str: Relative or absolute file name (rel. to rawdata/)
f: True | {False}: Reread even if dkl.rawdata_sources is nonempty

Effects:

dkl.rawdata_sources: dict: Settings read from JSON file

Versions:

2021-09-01 @ddalle: v1.0

read_rawdatafile(fname: str, ftype: str | None = None, cls: type | None = None, **kw) → DataKit¶

Read a file from the RAW_DATA folder

Call:

>>> db = ast.read_rawdatafile(fname, ftype=None, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fname: str: Name of file to read from raw data folder
ftype: {None} | str: Optional specifier to predetermine file type
cls: {None} | type: Class to read fname other than dkl[“DATAKIT_CLS”]
kw: dict: Additional keyword arguments passed to cls

Outputs:

db: dkl[“DATAKIT_CLS”] | cls: DataKit instance read from fname

See Also:

get_rawdatafilename()

Versions:

2021-06-25 @ddalle: v1.0
2021-07-07 @ddalle: v1.1
- use get_rawdatafilename()

read_requirement(j: int = 0) → DataKit¶

Read database from numbered requirement

Call:

>>> db = ast.read_requirement(j=0)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
j: {0} | int: Index of requirement to process

Outputs:

db: DataKit: Database from requirement j

Versions:

2025-06-13 @ddalle: v1.0

resolve_dbname(dbname: str) → str¶

Resolve a database name using current DKit’s prefixes

Suppose the current database name is "CAPE-DB-T-F3D-001".

Examples:

>>> ast.resolve_dbname("002")
"CAPE-DB-T-F3D-002"
>>> ast.resolve_dbname("RMX-001")
"CAPE-DB-T-RMX-001"
>>> ast.resolve_dbname("Q-C3D-102")
"CAPE-DB-Q-C3D-102"

Call:

>>> fulldbname = ast.resolve_dbname(dbname)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
dbname: str: (Partial) database name

Outputs:

fulldbname: str: Full database name, using groups from ast.DB_NAME

Versions:

2025-06-13 @ddalle: v1.0

update_rawdata(**kw)¶

Update raw data using rawdata/datakit-sources.json

The settings for zero or more “remotes” are read from that JSON file in the package’s rawdata/ folder. Example contents of such a file are shown below:

{
    "hub": [
        "/nobackup/user/",
        "pfe:/nobackupp16/user/git",
        "linux252:/nobackup/user/git"
    ],
    "remotes": {
        "origin": {
            "url": "data/datarepo.git",
            "type": "git-show",
            "glob": "aero_STACK*.csv",
            "regex": [
                "aero_CORE_no_[a-z]+\.csv",
                "aero_LSRB_no_[a-z]+\.csv",
                "aero_RSRB_no_[a-z]+\.csv"
            ],
            "commit": null,
            "branch": "main",
            "tag": null,
            "destination": "."
        }
    }
}

Call:

>>> ast.update_rawdata(remote=None, remotes=None)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {None} | str: Name of single remote to update
remotes: {None} | list[str]: Name of multiple remotes to update

Versions:

2021-09-02 @ddalle: v1.0
2022-01-18 @ddalle: Version 1.1; remote(s) kwarg

update_rawdata_remote(remote: str = 'origin')¶

Update raw data for one remote

Call:

>>> ast.update_rawdata_remote(remote="origin")

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
remote: {"origin"} | str: Name of remote

Versions:

2021-09-02 @ddalle: v1.0

write_db_cdb(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write (all) canonical CAPE databse (.cdb) file(s)

Call:

>>> db = ast.write_db_cdb(readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
readfunc: {None} | callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
cols: {None} | list: If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2021-09-10 @ddalle: v1.0

write_db_csv(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write (all) canonical db CSV file(s)

Call:

>>> db = ast.write_db_csv(readfunc=None, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
readfunc: {None} | callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
cols: {None} | list: If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2021-09-10 @ddalle: v1.0

write_db_mat(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write (all) canonical db MAT file(s)

Call:

>>> db = ast.write_db_mat(readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
readfunc: {None} | callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
cols: {None} | list: If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2021-09-10 @ddalle: v1.0

write_db_xlsx(readfunc: Callable | None = None, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write (all) canonical db XLSX file(s)

Call:

>>> db = ast.write_db_xlsx(readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
readfunc: {None} | callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
cols: {None} | list: If dkl has more than one file, cols must be a list of lists specifying which columns to write to each file
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2022-12-14 @ddalle: v1.0

write_dbfile_cdb(fcdb: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write a canonical db CDB (CAPE data binary) file

Call:

>>> db = ast.write_dbfile_cdb(fcdb, readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fcdb: str: Name of file to write
readfunc: callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2021-09-10 @ddalle: v1.0
2021-09-15 @ddalle: v1.1; check for DVC stub
2021-09-15 @ddalle: v1.2; add dvc option

write_dbfile_csv(fcsv: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write a canonical db CSV file

Call:

>>> db = ast.write_dbfile_csv(fcsv, readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fscv: str: Name of file to write
readfunc: callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2021-09-10 @ddalle: v1.0
2021-09-15 @ddalle: v1.1; check for DVC stub
2021-09-15 @ddalle: v1.2; add dvc option

write_dbfile_mat(fmat: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write a canonical db MAT file

Call:

>>> db = ast.write_dbfile_mat(fmat, readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fmat: str: Name of file to write
readfunc: callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2021-09-10 @ddalle: v1.0
2021-09-15 @ddalle: v1.1; check for DVC stub
2021-09-15 @ddalle: v1.2; add dvc option

write_dbfile_xlsx(fxls: str, readfunc: Callable, f: bool = True, db: DataKit | None = None, **kw) → DataKit | None¶

Write a canonical db XLSX file

Call:

>>> db = ast.write_dbfile_xlsx(fmat, readfunc, f=True, **kw)

Inputs:

ast: DataKitAssistant: Tool for reading datakits for a specific module
fxlsx: str: Name of file to write
readfunc: callable: Function to read source datakit if needed
f: {True} | False: Overwrite fmat if it exists
db: {None} | DataKit: Existing source datakit to write
dvc: True | {False}: Option to add and push data file using dvc

Outputs:

db: None | DataKit: If source datakit is read during execution, return it to be used in other write functions

Versions:

2022-12-14 @ddalle: v1.0

`cape.dkit.datakitloader`: DataKit collection tools¶

Previous topic

Next topic

This Page

cape.dkit.datakitloader: DataKit collection tools¶

`cape.dkit.datakitloader`: DataKit collection tools¶