cape.attdb.datakithub: Hub for importing DataKits by name

This module provides the class DataKitHub that provides a tool to simplify the importing of named “datakits” (from cape.attdb.rdb.DataKit). More specifically, it allows users to create one or more naming conventions for databases/datakits and to read that data with minimal low-level Python programming.

An instance of the DataKitHub class is created by reading a JSON file that contains the naming conventions, such as the following:

from cape.attdb.datakithub import DataKitHub

# Create an instance
hub = DataKitHub()

This will look for a file

data/datakithub/datakithub.json

in the current folder and each parent folder.

A simple datakithub.json file might contain the following:

{
    "DB-ATT": {
        "repo": "/home/user/datakit/db",
        "type": "module",
        "module_attribute": "db",
        "module_regex": {
            "DB-ATT-([0-9]+)": "dbatt.db%s",
        },
    }
}

It will make more sense to explain this content after seeing an example. Now we can use the DataKitHub instance to read databases by their title, such as "DB-ATT-1" or "DB-ATT-002", as long as they start with "DB-ATT" or some other string defined in the JSON file.

from cape.attdb.datakithub import DataKitHub

# Create an instance
hub = DataKitHub("/home/user/datakit/datakithub.json")

# Read the database "DB-ATT-1"
db1 = hub.read_db("DB-ATT-1")

# Read the database "DB-ATT-002"
db2 = hub.read_db("DB-ATT-002")

This is roughly the same as

# Read the database "DB-ATT-1"
import dbatt.db1
db1 = dbatt.db1.db

# Read the database "DB-ATT-002"
import dbatt.db002
db2 = dbatt.db002.db

but without having to deal with either sys.path or the PYTHONPATH environment variable, which can be both tedious and difficult to make work for multiple users on different types of computers.

Here is a description of the JSON parameters

repo: str

Name of the folder containing the data or modules

module_attribute: str | list | None

Name of variable(s) in imported module to use as datakit

module_function: str | list | None

Name of function(s) from imported module that return datakit

module_regex: dict[str]

Rules for converting a regular expression to module names

class cape.attdb.datakithub.DataKitHub(fjson=None, cwd=None)

Load datakits using only the database name

Call:
>>> hub = DataKitHub(fjson)
Inputs:
fjson: {None} | str

Path to JSON file with import rules for one or more db names

cwd: {None} | str

Path from which to begin search

Outputs:
hub: DataKitHub

Instance that implements import rules by name

Versions:
  • 2019-02-17 @ddalle: Version 1.0

  • 2021-08-19 @ddalle: Version 2.0
    • simpler search for JSON file

    • similar to how git finds .git folder

    • better regular expression support

    • can try multiple sections if one matches but fails

abspath(path)

Expand absolute path to a relative path

Call:
>>> abspath = hub.abspath(path)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

path: str

Path to some file, relative or absolute

Outputs:
abspath: None | str

Absolute path to path

Versions:
  • 2021-08-18 @ddalle: Version 1.0

expand_regex(regex_template)

Expand a regular expression template

Use defined groups from hub.regex_groups

Call:
>>> regex = hub.expand_regex(regex_template)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

regex_template: str

Raw template with <grp> or %(grp)s groups

Outputs:
regex: str

Expanded regex with (?P<grp>...) filled in

Versions:
  • 2021-08-17 @ddalle: Version 1.0

fullmatch(regex_template, dbname)

Match a full string (usually DB name) to a regex template

Call:
>>> groupdict = hub.match(regex_template, dbname)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

regex_template: str

Regular expression template for section of datakits

dbname: str

Database name for one datakit

Outputs:
groupdict: None | dict[str]

Augmented dict of groups from regex

Versions:
  • 2021-08-17 @ddalle: Version 1.0

genr8_modname(dbname, regex, template)

Determine module name from DB name, regex, and template

Call:
>>> modname = hub.genr8_modname(dbname, regex, template)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

dbname: str

Database name for one datakit

sec: str

Regular expression template for section of datakits

regex: str

Regular expression template for database names

template: str

Template for module name based on regex match groups

Outputs:
modname: None | str

Name of module according to regex and template

Versions:
  • 2021-08-17 @ddalle: Version 1.0

genr8_modpath(dbname, sec)

Generate $PYTHONPATH for given database name (if any)

Call:
>>> modpath = hub.genr8_modpath(dbname, sec)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

dbname: str

Database name for one datakit

sec: str

Regular expression template for section of datakits

template: str

Template for module name based on regex match groups

Outputs:
modpath: None | str

Path to module if not in existing $PYTHONPATH

Versions:
  • 2021-08-18 @ddalle: Version 1.0

get_regex_groups()

Get expanded regular expressions from hub.regex_groups

Call:
>>> regex_dict = hub.get_regex_groups()
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

regex_template: str

Raw template with <grp> or %(grp)s groups

Outputs:
regex_dict: dict[str]

Expanded regex with (?P<grp>...) for each group

Versions:
  • 2021-08-17 @ddalle: Version 1.0

get_section(sec)

Get options for specified module section

Call:
>>> secopts = hub.get_section(sec)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

sec: str

Name of datakit section

Outputs:
secopts: dict

Options for sec loaded in hub[sec]

Versions:
  • 2021-08-18 @ddalle: Version 1.0

get_section_opt(sec, opt, vdef=None)

Get the type of a given datakit group

Call:
>>> v = hub.get_section_opt(grp, opt, vdef=None)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

sec: str

Name of datakit section

opt: str

Name of option to access

vdef: {None} | any

Default value for opt

Outputs:
v: {vdef} |

Value of hub[grp][opt] or vdef

Versions:
  • 2021-02-18 @ddalle: Version 1.0

  • 2021-08-18 @ddalle: Version 1.1
    • was get_group_opt()

    • add module-level defaults

get_section_repo(sec)

Get repo option for section

Call:
>>> repo = hub.get_section_repo(sec)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

sec: str

Name of datakit section

Outputs:
repo: None | dict

Name of folder to add to path

Versions:
  • 2021-08-18 @ddalle: Version 1.0

get_section_type(sec)

Get type option for section

Call:
>>> sectype = hub.get_section_type(sec)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

sec: str

Name of datakit section

Outputs:
sectype: str

Name of folder to add to path

Versions:
  • 2021-08-18 @ddalle: Version 1.0

import_dbname(dbname, **kw)

Import a datakit module based on DB name

Call:
>>> mod = hub.import_dbname(dbname, **kw)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

dbname: str

Database name for one datakit

Keyword Arguments:
v, verbose: True | {False}

Option to report results of matching modules

vv, veryverbose: True | {False}

Option to report all attempts in matching sections

vvv, veryveryverbose: True | {False}

Option to report all attempts

Outputs:
mod: None | module

Imported module if possible

Versions:
  • 2021-02-18 @ddalle: Version 1.0

  • 2021-08-19 @ddalle: Version 2.0
    • forked from load_module()

    • better regular expression support

    • better fallback if more than one section matches

import_module(dbname, **kw)

Import a datakit module based on DB name

Call:
>>> mod = hub.import_module(dbname, **kw)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

dbname: str

Database name for one datakit

Keyword Arguments:
v, verbose: True | {False}

Option to report results of matching modules

vv, veryverbose: True | {False}

Option to report all attempts in matching sections

vvv, veryveryverbose: True | {False}

Option to report all attempts

Outputs:
mod: None | module

Imported module if possible

Versions:
  • 2021-02-18 @ddalle: Version 1.0

  • 2021-08-19 @ddalle: Version 2.0
    • forked from load_module()

    • better regular expression support

    • better fallback if more than one section matches

match(regex_template, dbname)

Match a regular expression template to a target string

Call:
>>> groupdict = hub.match(regex_template, dbname)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

regex_template: str

Regular expression template for section of datakits

dbname: str

Database name for one datakit

Outputs:
groupdict: None | dict[str]

Augmented dict of groups from regex

Versions:
  • 2021-08-17 @ddalle: Version 1.0

match_section(sec, dbname)

Check if a database name matches a given section

Call:
>>> groupdict = hub.match_section(section, dbname)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

section: str

Regular expression template for section of datakits

dbname: str

Database name for one datakit

Outputs:
groupdict: None | dict[str]

Augmented dict of groups from regex

Versions:
  • 2021-08-17 @ddalle: Version 1.0

read_db(dbname, **kw)

Read a datakit based on DB name

Call:
>>> db = hub.read_db(dbname, **kw)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

dbname: str

Database name for one datakit

Keyword Arguments:
v, verbose: True | {False}

Option to report results of matching modules

vv, veryverbose: True | {False}

Option to report all attempts in matching sections

vvv, veryveryverbose: True | {False}

Option to report all attempts

Outputs:
db: None | DataKit

Data interface if successful

Versions:
  • 2021-02-18 @ddalle: Version 1.0

  • 2021-08-19 @ddalle: Version 2.0
    • better regex and fallback support

    • verbosity options

    • calls read_dbname()

read_dbname(dbname, **kw)

Read a datakit based on DB name

Call:
>>> db = hub.read_dbname(dbname, **kw)
Inputs:
hub: DataKitHub

Instance of datakit-reading hub

dbname: str

Database name for one datakit

Keyword Arguments:
v, verbose: True | {False}

Option to report results of matching modules

vv, veryverbose: True | {False}

Option to report all attempts in matching sections

vvv, veryveryverbose: True | {False}

Option to report all attempts

Outputs:
db: None | DataKit

Data interface if successful

Versions:
  • 2021-08-18 @ddalle: Version 1.0

cape.attdb.datakithub.prepare_template(template)

Expand a string template with some substitutions

The substitutions made include:

  • r"\g<grp>" –> "%(grp)s"

  • r"\l\g<grp>" –> "%(l-grp)s"

  • r"\u\1" –> "%(u-1)s"

  • r"\1" –> "%(1)s"

Call:
>>> fmt = prepare_template(template)
Inputs:
template: str

Initial template, mixing dict string expansion and re.sub() syntax

Outputs:
fmt: str

Template ready for standard string expansion, for example using fmt % grpdict where grpdict is a dict

Versions:
  • 2021-08-18 @ddalle: Version 1.0