lfcrepo: Interface to git repos with large-file control

This module provides the LFCRepo, which provides tools for interacting with Git repositories. This includes actions to hash, store, and transfer large files tracked with lfc.

class lfc.lfcrepo.LFCRepo(where=None)

LFC interface to individual repositories

Call:
>>> repo = LFCRepo(where=None)
Inputs:
where: {None} | str

Location of repo (None -> os.getcwd())

bare

True | False – Whether this instance is in a bare repository

check_cache(flfc: str)

Check if large file is in local cache

Call:
>>> status = repo.check_cache(flfc)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file

Outputs:
status: True | False

Whether file is present in local cache

Versions:
  • 2022-12-28 @ddalle: v1.0

close_lfc_portal(remote=None)

Close large file transfer portal, if any

Call:
>>> repo.close_lfc_portal(remote=None)
Inputs:
repo: GitRepo

Interface to git repository

remote: {None} | str

Name of remote, or default

Versions:
  • 2022-12-20 @ddalle: v1.0

  • 2023-10-26 @ddalle: v1.1; multiple portals

find_lfc_files(pattern=None, ext=None, mode=None, **kw) list

Find all large file stubs

Call:
>>> lfcfiles = repo.find_lfc_files(pattern=None, ext=None)
Inputs:
repo: GitRepo

Interface to git repository

pattern: {None} | str

Pattern to restrict search of large file stubs

ext: {None} | ".lfc" | ".dvc"

Optional manual working stub extension to use

Outputs:
lfcfiles: list[str]

List of large file stubs, each ending with ext

Versions:
  • 2022-12-20 @ddalle: v1.0

  • 2022-12-28 @ddalle: v1.1; bug fix for empty result

  • 2023-10-26 @ddalle: v2.0
    • use ls_tree() instead of calling git ls-files

    • works with lfc add data/ or similar

  • 2023-11-08 @ddalle: v2.1; add mode

genr8_hash(fname: str)

Calculate SHA-256 hex digest of a file

Call:
>>> hexhash = repo.genr8_hash(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file to hash

Outputs:
hexhash: str

SHA-256 hex digest of file’s bytes

Versions:
  • 2022-12-28 @ddalle: v1.0

genr8_lfc_filename(fname: str, ext=None) str

Produce name of large file stub

Call:
>>> flfc = repo.genr8_lfc_filename(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file, either original file or metadata stub

ext: {None} | ".dvc" | ".lfc"

Large file metadata stub file extension

Outputs:
flfc: str

Name of large file metadata stub file

Versions:
  • 2022-12-21 @ddalle: v1.0

genr8_lfc_glob(*fnames, mode=None)

Generate list of .lfc files matchin one or more pattern

Call:
>>> lfcfiles = repo.genr8_lfc_glob(*fnames, mode=None)
Inputs:
repo: GitRepo

Interface to git repository

fnames: tuple[str]

List of file name patterns to search for

mode: {None} | 1 | 2

LFC file mode to search for

Outputs:
lfcfiles: list[str]

List of matching .lfc files

genr8_lfc_ofilename(fname: str) str

Produce name of original large file

This strips the .lfc or .dvc extension if necessary.

Call:
>>> forig = repo.genr8_lfc_ofilename(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of file, either original file or metadata stub

Outputs:
forig: str

Name of original large file w/o LFC extension

Versions:
  • 2022-12-21 @ddalle: v1.0

get_cachedir()

Get name of large file cache folder

Call:
>>> fdir = repo.get_cachedir()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:
fdir: str

Absolute path to large file cache

Versions:
  • 2022-12-19 @ddalle: v1.0

  • 2022-12-22 @ddalle: v1.1: bare repo omits “.lfc”

get_lfc_autopull() int

Get the LFC mode for auto-pull

  • 0: do not auto-pull files

  • 1: auto-pull all files

  • 2: auto-pull all mode-2 files (default)

Call:
>>> mode = repo.get_lfc_autopull()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:
mode: int

Files of LCF files to automatically push

get_lfc_autopush() int

Get the LFC mode for auto-push

  • 0: do not auto-push files

  • 1: auto-push all files

  • 2: auto-push all mode-2 files (default)

Call:
>>> mode = repo.get_lfc_autopush()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:
mode: int

Files of LCF files to automatically push

get_lfc_configfile(ext=None)

Get name of LFC configuration file

In a bare repo, this will return the path relative to the root, e.g. ".lfc/config". In a working repo, it will return the absolute path.

Call:
>>> fcfg = repo.get_lfc_configfile(ext=None)
Inputs:
repo: GitRepo

Interface to git repository

ext: {None} | ".lfc" | ".dvc"

Optional manual override for file extension

Outputs:
fcfg: str

Name of large file client configuration

Versions:
  • 2022-12-20 @ddalle: v1.0

  • 2022-12-28 @ddalle: v1.1; optional ext input

get_lfc_ext(vdef='.lfc')

Get name of large file utility

Call:
>>> ext = repo.get_lfc_ext(vdef=".lfc")
Inputs:
repo: GitRepo

Interface to git repository

vdef: {".lfc"} | ".dvc"

Preferred default if neither is present

Outputs:
ext: ".dvc" | ".lfc"

Working extenion to use for large file stubs

Versions:
  • 2022-12-19 @ddalle: v1.0

  • 2022-12-22 @ddalle: v2.0; valid for bare repos

get_lfc_hash(fname: str, ref=None)

Get hash code used by LFC for a large file

Call:
>>> hashcode = repo.read_lfc_file(fname, ref=None)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of original file or large file stub

ref: {None} | str

Optional git reference (default HEAD on bare repo)

Outputs:
hashcode: str

SHA-256 hash code from LFC file (MD-5 if added by DVC)

Versions:
  • 2011-12-22 @ddalle: v1.0

get_lfc_remote_url(remote=None)

Get URL for a large file client remote

Call:
>>> url = repo.get_lfc_remote_url(remote)
Inputs:
repo: GitRepo

Interface to git repository

remote: {None} | str

Optional explicit remote name

Outputs:
url: str

Path to LFC remote, either local or SSH

Versions:
  • 2022-12-22 @ddalle: v1.0

  • 2023-03-17 @ddalle: v1.1; remote -> local url check

get_lfcdir()

Get path to large file root dir

Call:
>>> fdir = repo.get_lfcdir()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:
fdir: str

Path to LFC/DVC settings dir, .lfc or .dvc

Versions:
  • 2022-12-19 @ddalle: v1.0

gitdir

str – Absolute path to root directory

lfc_add(*fnames, **kw)

Add one or more large files

Call:
>>> repo.lfc_add(*fnames, **kw)
Inputs:
repo: GitRepo

Interface to git repository

fnames: tuple[str]

Names or wildcard patterns of files to add using LFC

Versions:
  • 2022-12-28 @ddalle: v1.0

lfc_checkout(fname: str, *fnames, **kw)

Checkout one or more large files from current .lfc stub

Call:
>>> repo.lfc_checkout(*fnames, **kw)
Inputs:
repo: GitRepo

Interface to git repository

fnames: tuple[str]

Names or wildcard patterns of files

f, force: True | {False}

Delete uncached working file if present

Versions:
  • 2023-10-24 @ddalle: v1.0

lfc_config_get(fullopt: str, vdef=None) str

Get an option from the large file client configuration

Call:
>>> val = repo.lfc_config_get(section, opt)
Inputs:
repo: GitRepo

Interface to git repository

fullopt: str

Name of LFC config section

opt: str

Option in LFC config section to query

vdef: {None} | object

Default value

Outputs:
val: str

Raw value of LCF config option

Raises:
  • GitutilsKeyError if either section or opt is missing from the LFC config (unless vdef is set)

Versions:
  • 2022-12-27 @ddalle: v1.0

lfc_config_set(fullopt: str, val)

Set an LFC configuration setting

Call:
>>> repo.lfc_config_set(section, opt, val)
Inputs:
repo: GitRepo

Interface to git repository

fullopt: str

Name of LFC config section

val: object

Value to set (converted to str)

Versions:
  • 2022-12-28 @ddalle: v1.0

lfc_init(**kw)

Initialize a git repo for Large File Control

Call:
>>> repo.lfc_init()
Inputs:
repo: GitRepo

Interface to git repository

Versions:
  • 2022-12-28 @ddalle: v1.0

  • 2023-10-25 @ddalle: v1.1; better double-call behavior

lfc_install_hooks(*a, **kw)

Install full set of git-hooks for this repo

Call:
>>> repo.lfc_install_hooks()
Inputs:
repo: GitRepo

Interface to git repository

lfc_install_post_merge(*a, **kw)

Install post-merge hook to auto-pull mode=2 files

Call:
>>> repo.lfc_install_post_merge()
Inputs:
repo: GitRepo

Interface to git repository

lfc_install_pre_push(*a, **kw)

Install pre-push hook to auto-push some files

Call:
>>> repo.lfc_install_pre_push()
Inputs:
repo: GitRepo

Interface to git repository

lfc_pull(*fnames, **kw)

Pull one or more large files from remote cache

Call:
>>> repo.lfc_pull(*fnames, **kw)
Inputs:
repo: GitRepo

Interface to git repository

fnames: tuple[str]

Names or wildcard patterns of files

mode: {None} | 1 | 2

LFC file mode:

f, force: True | {False}

Delete uncached working file if present

Versions:
  • 2022-12-28 @ddalle: v1.0

  • 2023-11-08 @ddalle: v1.1; add mode

lfc_push(*fnames, **kw)

Push one or more large files to remote cache

Call:
>>> repo.lfc_push(*fnames, **kw)
Inputs:
repo: GitRepo

Interface to git repository

fnames: tuple[str]

Names or wildcard patterns of files

Versions:
  • 2022-12-28 @ddalle: v1.0

lfc_replace_dvc()

Fully subsitute local large file control in place of DVC

This command will move all .dvc metadata stub files to the same name but with .lfc as the extension. It will also move the .dvc/ folder to .lfc/ and remove the .dvc/plots/ folder and .dvcignore file.

If both .dvc/ and .lfc/ exist, this function will merge the caches so that any files in .dvc/cache/ are copied into .lfc/cache/.

It does not recompute any hashes as LFC can work with MD-5 hashes. It does not compute any new ones, but it can still utilize the old ones and have the two intermixed.

Call:
>>> repo.lfc_replace_dvc()
Inputs:
repo: GitRepo

Interface to git repository

Versions:
  • 2022-12-28 @ddalle: v1.0

  • 2023-03-17 @ddalle: v1.1; delete .dvc/plots first

  • 2023-10-27 @ddalle: v1.2; merge caches

lfc_set_mode(*fnames, **kw)

Set LFC mode for one or more files

Call:
>>> repo.lfc_set_mode(*fnames, **kw)
Inputs:
repo: GitRepo

Interface to git repository

fnames: tuple[str]

Names or wildcard patterns of files to add using LFC

lfc_show(fname: str, ref=None, **kw)

Show the contents of an LFC file from a local cache

Call:
>>> contents = repo.lfc_show(fname)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of original file or large file stub

ref: {None} | str

Optional git reference (default HEAD on bare repo)

Outputs:
contents: bytes

Contents of large file read from LFC cache

Versions:
  • 2011-12-22 @ddalle: v1.0

list_lfc_remotes() list

List all large file remote names

Call:
>>> remotenames = repo.list_lfc_remotes()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:

remotenames: list[str]

Versions:
  • 2022-12-25 @ddalle: v1.0

make_cachedir()

Create large file cache folder if necessary

Call:
>>> repo.make_cachedir()
Inputs:
repo: GitRepo

Interface to git repository

Versions:
  • 2022-12-19 @ddalle: v1.0

make_lfc_config()

Read large file client config file, or access current

Call:
>>> config = repo.make_lfc_config()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:
config: configparser.ConfigParser

Python interface to LFC configuration

Versions:
  • 2022-12-22 @ddalle: v1.0

make_lfc_portal(remote=None) SSHPortal

Open SSH/SFTP portal for large files

Call:
>>> portal = repo.make_lfc_portal(slot="lfc_portal")
Inputs:
repo: GitRepo

Interface to git repository

remote: {None} | str

Name of remote, or default

Outputs:
portal: None | shellutils.SSHPortal

Persistent file transfer portal

Versions:
  • 2022-12-20 @ddalle: v1.0

  • 2023-10-26 @ddalle: v1.1; multiple portals

read_lfc_config()

Read large file client config file, even on bare repo

Call:
>>> config = repo.read_lfc_config()
Inputs:
repo: GitRepo

Interface to git repository

Outputs:
config: configparser.ConfigParser

Python interface to LFC configuration

Versions:
  • 2022-12-22 @ddalle: v1.0

read_lfc_file(fname: str, ref=None, ext=None)

Read status information from large file stub

Call:
>>> info = repo.read_lfc_file(fname, ref=None, ext=None)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of original file or large file stub

ref: {None} | str

Optional git reference (default HEAD on bare repo)

ext: {None} | ".dvc" | ".lfc"

Large file metadata stub file extension

Outputs:
info: dict

Dictionary of information about large file

info[“sha256”]: str

Hex string of SHA-256 hash of file

info[“md5”]: str

Hex string of MD-5 hash of file added by dvc

info[“size”]: str

String of integer of number of bytes in large file

info[“path”]: str

Name of original file

Versions:
  • 2011-12-20 @ddalle: v1.0

read_lfc_mode(fname: str, ref=None, ext=None) int

Read LFC file mode for a tracked file

Call:
>>> mode = repo.read_lfc_mode(fname, ref=None)
Inputs:
repo: GitRepo

Interface to git repository

fname: str

Name of original file or large file stub

ref: {None} | str

Optional git reference (default HEAD on bare repo)

Outputs:
mode: 1 | 2

LFC file mode:

  • 1: Only push/pull file on-demand

  • 2: Automatically push/pull most recent version

Versions:
  • 2023-11-08 @ddalle: v1.0

resolve_lfc_remote_name(remote=None)

Resolve default LFC remote, if necessary

Call:
>>> remotename = repo.resolve_lfc_remote_name(remote)
Inputs:
repo: GitRepo

Interface to git repository

remote: {None} | str

Optional explicit remote name

Outputs:
remotename: str

Either remote or LFC setting for core.remote

Versions:
  • 2022-12-22 @ddalle: v1.0

rm_lfc_remote(remote: str)

Remove a large file client remote, if possible

Call:
>>> repo.rm_lfc_remote(remote)
Inputs:
repo: GitRepo

Interface to git repository

remote: str

Name of large file remote

Outputs:

remotenames: list[str]

Versions:
  • 2022-12-27 @ddalle: v1.0

set_lfc_remote(remote: str, url: str, **kw)

Add or set URL of an LFC remote

Call:
>>> repo.set_lfc_remote(remote, url, **kw)
Inputs:
repo: GitRepo

Interface to git repository

remote: str

Name of LFC remote

url: str

Path for LCF remote to point to

d, default: True | {False}

Also set remote as the default LFC remote

Versions:
  • 2022-12-25 @ddalle: v1.0

write_lfc_config(config)

Write current large file configuration to file

Call:
>>> repo.write_lfc_config(config)
Inputs:
repo: GitRepo

Interface to git repository

config: configparser.ConfigParser

Python interface to LFC configuration

Versions:
  • 2022-12-27 @ddalle: v1.0