`cape.attdb.datakithub`: Hub for importing DataKits by name¶

This module provides the class DataKitHub that provides a tool to simplify the importing of named “datakits” (from cape.attdb.rdb.DataKit). More specifically, it allows users to create one or more naming conventions for databases/datakits and to read that data with minimal low-level Python programming.

An instance of the DataKitHub class is created by reading a JSON file that contains the naming conventions, such as the following:

from cape.attdb.datakithub import DataKitHub

# Create an instance
hub = DataKitHub()

This will look for a file

data/datakithub/datakithub.json

in the current folder and each parent folder.

A simple datakithub.json file might contain the following:

{
    "DB-ATT": {
        "repo": "/home/user/datakit/db",
        "type": "module",
        "module_attribute": "db",
        "module_regex": {
            "DB-ATT-([0-9]+)": "dbatt.db%s",
        },
    }
}

It will make more sense to explain this content after seeing an example. Now we can use the DataKitHub instance to read databases by their title, such as "DB-ATT-1" or "DB-ATT-002", as long as they start with "DB-ATT" or some other string defined in the JSON file.

from cape.attdb.datakithub import DataKitHub

# Create an instance
hub = DataKitHub("/home/user/datakit/datakithub.json")

# Read the database "DB-ATT-1"
db1 = hub.read_db("DB-ATT-1")

# Read the database "DB-ATT-002"
db2 = hub.read_db("DB-ATT-002")

This is roughly the same as

# Read the database "DB-ATT-1"
import dbatt.db1
db1 = dbatt.db1.db

# Read the database "DB-ATT-002"
import dbatt.db002
db2 = dbatt.db002.db

but without having to deal with either sys.path or the PYTHONPATH environment variable, which can be both tedious and difficult to make work for multiple users on different types of computers.

Here is a description of the JSON parameters

repo: str
Name of the folder containing the data or modules

module_attribute: str | list | None
Name of variable(s) in imported module to use as datakit

module_function: str | list | None
Name of function(s) from imported module that return datakit

module_regex: dict[str]
Rules for converting a regular expression to module names

class cape.attdb.datakithub.DataKitHub(fjson=None, cwd=None)¶

Load datakits using only the database name

Call:

>>> hub = DataKitHub(fjson)

Inputs:

fjson: {None} | str: Path to JSON file with import rules for one or more db names
cwd: {None} | str: Path from which to begin search

Outputs:

hub: DataKitHub: Instance that implements import rules by name

Versions:

2019-02-17 @ddalle: Version 1.0
2021-08-19 @ddalle: Version 2.0
- simpler search for JSON file
- similar to how git finds .git folder
- better regular expression support
- can try multiple sections if one matches but fails

abspath(path)¶

Expand absolute path to a relative path

Call:

>>> abspath = hub.abspath(path)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
path: str: Path to some file, relative or absolute

Outputs:

abspath: None | str: Absolute path to path

Versions:

2021-08-18 @ddalle: Version 1.0

expand_regex(regex_template)¶

Expand a regular expression template

Use defined groups from hub.regex_groups

Call:

>>> regex = hub.expand_regex(regex_template)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
regex_template: str: Raw template with <grp> or %(grp)s groups

Outputs:

regex: str: Expanded regex with (?P<grp>...) filled in

Versions:

2021-08-17 @ddalle: Version 1.0

fullmatch(regex_template, dbname)¶

Match a full string (usually DB name) to a regex template

Call:

>>> groupdict = hub.match(regex_template, dbname)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
regex_template: str: Regular expression template for section of datakits
dbname: str: Database name for one datakit

Outputs:

groupdict: None | dict[str]: Augmented dict of groups from regex

Versions:

2021-08-17 @ddalle: Version 1.0

genr8_modname(dbname, regex, template)¶

Determine module name from DB name, regex, and template

Call:

>>> modname = hub.genr8_modname(dbname, regex, template)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
dbname: str: Database name for one datakit
sec: str: Regular expression template for section of datakits
regex: str: Regular expression template for database names
template: str: Template for module name based on regex match groups

Outputs:

modname: None | str: Name of module according to regex and template

Versions:

2021-08-17 @ddalle: Version 1.0

genr8_modpath(dbname, sec)¶

Generate $PYTHONPATH for given database name (if any)

Call:

>>> modpath = hub.genr8_modpath(dbname, sec)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
dbname: str: Database name for one datakit
sec: str: Regular expression template for section of datakits
template: str: Template for module name based on regex match groups

Outputs:

modpath: None | str: Path to module if not in existing $PYTHONPATH

Versions:

2021-08-18 @ddalle: Version 1.0

get_regex_groups()¶

Get expanded regular expressions from hub.regex_groups

Call:

>>> regex_dict = hub.get_regex_groups()

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
regex_template: str: Raw template with <grp> or %(grp)s groups

Outputs:

regex_dict: dict[str]: Expanded regex with (?P<grp>...) for each group

Versions:

2021-08-17 @ddalle: Version 1.0

get_section(sec)¶

Get options for specified module section

Call:

>>> secopts = hub.get_section(sec)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
sec: str: Name of datakit section

Outputs:

secopts: dict: Options for sec loaded in hub[sec]

Versions:

2021-08-18 @ddalle: Version 1.0

get_section_opt(sec, opt, vdef=None)¶

Get the type of a given datakit group

Call:

>>> v = hub.get_section_opt(grp, opt, vdef=None)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
sec: str: Name of datakit section
opt: str: Name of option to access
vdef: {None} | any: Default value for opt

Outputs:

v: {vdef} |: Value of hub[grp][opt] or vdef

Versions:

2021-02-18 @ddalle: Version 1.0
2021-08-18 @ddalle: Version 1.1
- was get_group_opt()
- add module-level defaults

get_section_repo(sec)¶

Get repo option for section

Call:

>>> repo = hub.get_section_repo(sec)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
sec: str: Name of datakit section

Outputs:

repo: None | dict: Name of folder to add to path

Versions:

2021-08-18 @ddalle: Version 1.0

get_section_type(sec)¶

Get type option for section

Call:

>>> sectype = hub.get_section_type(sec)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
sec: str: Name of datakit section

Outputs:

sectype: str: Name of folder to add to path

Versions:

2021-08-18 @ddalle: Version 1.0

import_dbname(dbname, **kw)¶

Import a datakit module based on DB name

Call:

>>> mod = hub.import_dbname(dbname, **kw)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
dbname: str: Database name for one datakit

Keyword Arguments:

v, verbose: True | {False}: Option to report results of matching modules
vv, veryverbose: True | {False}: Option to report all attempts in matching sections
vvv, veryveryverbose: True | {False}: Option to report all attempts

Outputs:

mod: None | module: Imported module if possible

Versions:

2021-02-18 @ddalle: Version 1.0
2021-08-19 @ddalle: Version 2.0
- forked from load_module()
- better regular expression support
- better fallback if more than one section matches

import_module(dbname, **kw)¶

Import a datakit module based on DB name

Call:

>>> mod = hub.import_module(dbname, **kw)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
dbname: str: Database name for one datakit

Keyword Arguments:

v, verbose: True | {False}: Option to report results of matching modules
vv, veryverbose: True | {False}: Option to report all attempts in matching sections
vvv, veryveryverbose: True | {False}: Option to report all attempts

Outputs:

mod: None | module: Imported module if possible

Versions:

2021-02-18 @ddalle: Version 1.0
2021-08-19 @ddalle: Version 2.0
- forked from load_module()
- better regular expression support
- better fallback if more than one section matches

match(regex_template, dbname)¶

Match a regular expression template to a target string

Call:

>>> groupdict = hub.match(regex_template, dbname)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
regex_template: str: Regular expression template for section of datakits
dbname: str: Database name for one datakit

Outputs:

groupdict: None | dict[str]: Augmented dict of groups from regex

Versions:

2021-08-17 @ddalle: Version 1.0

match_section(sec, dbname)¶

Check if a database name matches a given section

Call:

>>> groupdict = hub.match_section(section, dbname)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
section: str: Regular expression template for section of datakits
dbname: str: Database name for one datakit

Outputs:

groupdict: None | dict[str]: Augmented dict of groups from regex

Versions:

2021-08-17 @ddalle: Version 1.0

read_db(dbname, **kw)¶

Read a datakit based on DB name

Call:

>>> db = hub.read_db(dbname, **kw)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
dbname: str: Database name for one datakit

Keyword Arguments:

v, verbose: True | {False}: Option to report results of matching modules
vv, veryverbose: True | {False}: Option to report all attempts in matching sections
vvv, veryveryverbose: True | {False}: Option to report all attempts

Outputs:

db: None | DataKit: Data interface if successful

Versions:

2021-02-18 @ddalle: Version 1.0
2021-08-19 @ddalle: Version 2.0
- better regex and fallback support
- verbosity options
- calls read_dbname()

read_dbname(dbname, **kw)¶

Read a datakit based on DB name

Call:

>>> db = hub.read_dbname(dbname, **kw)

Inputs:

hub: DataKitHub: Instance of datakit-reading hub
dbname: str: Database name for one datakit

Keyword Arguments:

v, verbose: True | {False}: Option to report results of matching modules
vv, veryverbose: True | {False}: Option to report all attempts in matching sections
vvv, veryveryverbose: True | {False}: Option to report all attempts

Outputs:

db: None | DataKit: Data interface if successful

Versions:

2021-08-18 @ddalle: Version 1.0

cape.attdb.datakithub.prepare_template(template)¶

Expand a string template with some substitutions

The substitutions made include:

r"\g<grp>" –> "%(grp)s"

r"\l\g<grp>" –> "%(l-grp)s"

r"\u\1" –> "%(u-1)s"

r"\1" –> "%(1)s"

Call:

>>> fmt = prepare_template(template)

Inputs:

template: str: Initial template, mixing dict string expansion and re.sub() syntax

Outputs:

fmt: str: Template ready for standard string expansion, for example using fmt % grpdict where grpdict is a dict

Versions:

2021-08-18 @ddalle: Version 1.0

`cape.attdb.datakithub`: Hub for importing DataKits by name¶

Previous topic

Next topic

This Page

cape.attdb.datakithub: Hub for importing DataKits by name¶

`cape.attdb.datakithub`: Hub for importing DataKits by name¶