API Reference

Capture portable provenance metadata for data-processing outputs.

class reprotrail.provenance.GitState(repo_root, commit, branch, remote_url, dirty, dirty_marker, status_short, diff_hash=None)[source]

Snapshot of a Git repository at one point in time.

Parameters:
  • repo_root (Path)

  • commit (str | None)

  • branch (str | None)

  • remote_url (str | None)

  • dirty (bool)

  • dirty_marker (str)

  • status_short (str)

  • diff_hash (str | None)

class reprotrail.provenance.InputPathState(path, exists, kind, backend, metadata, git_state=None, git_path=None, git_status='', error=None)[source]

Snapshot of an input path and metadata that identifies it.

Parameters:
  • path (Path)

  • exists (bool)

  • kind (str)

  • backend (Literal['git-lfs', 'dvc', 'git', 'filesystem', 'unknown'])

  • metadata (dict[str, Any])

  • git_state (GitState | None)

  • git_path (str | None)

  • git_status (str)

  • error (str | None)

reprotrail.provenance.canonicalize_remote_url(remote_url)[source]

Return a portable, reader-friendly form of a Git remote URL.

Parameters:

remote_url (str | None)

Return type:

str | None

reprotrail.provenance.run_git(args, cwd)[source]

Run Git and return (ok, stdout, error) without raising.

Parameters:
  • args (Sequence[str])

  • cwd (Path | str)

Return type:

tuple[bool, str, str]

reprotrail.provenance.discover_repo_root(repo_dir=None, *, max_parent_levels=3)[source]

Find the Git repository root for a directory or one of its parents.

Parameters:
  • repo_dir (Path | str | None)

  • max_parent_levels (int)

Return type:

Path | None

reprotrail.provenance.get_git_state(repo_dir='.', *, remote='origin', include_diff_hash=True)[source]

Capture the current Git state for a repository.

Parameters:
  • repo_dir (Path | str)

  • remote (str)

  • include_diff_hash (bool)

Return type:

GitState

reprotrail.provenance.public_git_state(state)[source]

Return a portable Git state record suitable for public metadata.

Parameters:

state (GitState | Mapping[str, Any])

Return type:

dict[str, Any]

reprotrail.provenance.summarize_directory(path, *, max_entries=20000)[source]

Summarize a directory without embedding a full file listing.

Parameters:
  • path (Path | str)

  • max_entries (int)

Return type:

dict[str, Any]

reprotrail.provenance.get_input_path_state(path)[source]

Inspect one input path and classify its provenance backend.

Parameters:

path (Path | str)

Return type:

InputPathState

reprotrail.provenance.get_input_path_states(paths)[source]

Inspect multiple input paths, preserving input order.

Parameters:

paths (Iterable[Path | str])

Return type:

list[InputPathState]

reprotrail.provenance.public_input_path_state(state)[source]

Return a compact input path record without local-only repository roots.

Parameters:

state (InputPathState | Mapping[str, Any])

Return type:

dict[str, Any]

reprotrail.provenance.public_provenance(value)[source]

Return provenance metadata intended to be written into public outputs.

Parameters:

value (Any)

Return type:

Any

reprotrail.provenance.clean_command_parts(parts)[source]

Remove reprotrail/provenance sidecar flags from recorded commands.

Parameters:

parts (Sequence[str])

Return type:

list[str]

reprotrail.provenance.build_cf_history_entry(command=None, *, git_state=None, git_states=(), input_states=(), timestamp=None, include_inputs=False)[source]

Build a timestamped history line suitable for CF/xarray attrs.

Parameters:
  • command (str | Sequence[str] | None)

  • git_state (GitState | Mapping[str, Any] | None)

  • git_states (Sequence[GitState | Mapping[str, Any]])

  • input_states (Sequence[InputPathState | Mapping[str, Any]])

  • timestamp (datetime | None)

  • include_inputs (bool)

Return type:

str

reprotrail.provenance.append_cf_history(existing, entry)[source]

Prepend a new entry to existing CF history text.

Parameters:
  • existing (str | None)

  • entry (str)

Return type:

str

reprotrail.provenance.append_xarray_history(obj, entry, *, copy=False)[source]

Prepend a history entry to an xarray-like object’s attrs.

Parameters:
  • obj (Any)

  • entry (str)

  • copy (bool)

Return type:

Any

reprotrail.provenance.enforce_clean_repos(repos, *, allow_dirty=False, missing_ok=True)[source]

Validate that repositories are clean unless dirty state is allowed.

Parameters:
  • repos (Iterable[Path | str])

  • allow_dirty (bool)

  • missing_ok (bool)

Return type:

list[GitState]

reprotrail.provenance.env_allows_dirty(var='REPROTRAIL_ALLOW_DIRTY')[source]

Return whether an environment variable opts into dirty repositories.

Parameters:

var (str)

Return type:

bool

Pixi environment and editable dependency helpers.

exception reprotrail.pixi.PixiGitFreshnessError[source]

Raised when Pixi Git freshness cannot be checked safely.

reprotrail.pixi.pixi_environment_block(lock_text, environment)[source]

Return the environment block from a Pixi lockfile.

Parameters:
  • lock_text (str)

  • environment (str | None)

Return type:

str

reprotrail.pixi.is_local_pixi_ref(value)[source]

Return whether a Pixi pypi reference points at a local path.

Parameters:

value (str)

Return type:

bool

reprotrail.pixi.pixi_local_path_dependencies(lock_text, environment)[source]

List local path dependencies in one Pixi environment.

Parameters:
  • lock_text (str)

  • environment (str | None)

Return type:

list[str]

reprotrail.pixi.pixi_package_names_by_pypi(lock_text)[source]

Return package names keyed by Pixi pypi source reference.

Parameters:

lock_text (str)

Return type:

dict[str, str]

reprotrail.pixi.pixi_dependency_records(lock_text, environment, project_root)[source]

Classify local Pixi dependencies as project-self or external-editable.

Parameters:
  • lock_text (str)

  • environment (str | None)

  • project_root (Path)

Return type:

list[dict[str, Any]]

reprotrail.pixi.public_dependency_records(records)[source]

Remove private resolved-path fields from dependency records.

Parameters:

records (list[dict[str, Any]])

Return type:

list[dict[str, Any]]

reprotrail.pixi.editable_dependency_failures(dependency_records, *, allow_editable)[source]

Return policy failures for external editable/path dependencies.

Parameters:
  • dependency_records (list[dict[str, Any]])

  • allow_editable (bool)

Return type:

list[str]

reprotrail.pixi.repo_paths_with_dependencies(repos, dependency_records)[source]

Append resolved editable dependency Git repos to an inspected repo list.

Parameters:
  • repos (list[str])

  • dependency_records (list[dict[str, Any]])

Return type:

list[str]

reprotrail.pixi.normalize_package_name(name)[source]

Return a normalized Python distribution name.

Parameters:

name (str)

Return type:

str

reprotrail.pixi.package_records(package_names)[source]

Return installed package records with sanitized source identity.

Parameters:

package_names (list[str] | tuple[str, ...])

Return type:

list[dict[str, Any]]

reprotrail.pixi.package_versions(package_names)[source]

Return installed versions for package names that can be resolved.

Parameters:

package_names (list[str] | tuple[str, ...])

Return type:

dict[str, str]

reprotrail.pixi.pixi_package_license_records(project_root, pixi_environment)[source]

Return package license metadata from the local Pixi environment.

Parameters:
  • project_root (Path)

  • pixi_environment (str | None)

Return type:

list[dict[str, Any]]

reprotrail.pixi.check_pixi_git_freshness(project_root, environment, packages, *, manifest_path=None)[source]

Check whether selected Git-backed Pixi packages would move on update.

Parameters:
  • project_root (Path)

  • environment (str)

  • packages (tuple[str, ...])

  • manifest_path (Path | None)

Return type:

dict[str, Any]

reprotrail.pixi.infer_pixi_environment(project_root, value=None)[source]

Infer the active Pixi environment from an explicit value, env var, or Python.

Parameters:
  • project_root (Path)

  • value (str | None)

Return type:

str | None

reprotrail.pixi.environment_summary(*, project_root, pixi_environment, dependency_records, allow_editable, package_names, env_var_whitelist)[source]

Build a portable summary of the active runtime environment.

Parameters:
  • project_root (Path)

  • pixi_environment (str | None)

  • dependency_records (list[dict[str, Any]])

  • allow_editable (bool)

  • package_names (tuple[str, ...])

  • env_var_whitelist (tuple[str, ...])

Return type:

dict[str, Any]

reprotrail.pixi.write_environment_bundle(*, run_root, project_root, lockfile, pixi_environment, dependency_records, allow_editable, package_names, env_var_whitelist)[source]

Copy a Pixi lockfile and environment summary into provenance artifacts.

Parameters:
  • run_root (Path)

  • project_root (Path)

  • lockfile (Path)

  • pixi_environment (str | None)

  • dependency_records (list[dict[str, Any]])

  • allow_editable (bool)

  • package_names (tuple[str, ...])

  • env_var_whitelist (tuple[str, ...])

Return type:

dict[str, Any]

Command runner that records provenance and runtime status.

exception reprotrail.runner.RunError[source]

Raised when the runner cannot start or finish safely.

reprotrail.runner.diagnostic_software_states(*, candidates, trusted_states, log)[source]

Return configured repo states that are not trusted runtime repos.

Parameters:
  • candidates (list[str])

  • trusted_states (list[dict[str, Any]])

  • log (Path)

Return type:

list[dict[str, Any]]

reprotrail.runner.run_with_provenance(*, command, log, repos=None, allow_dirty=False, allow_editable=False, allow_partial_metadata=False, provenance_json=None, product_output=None, settings=None)[source]

Run a command while recording v1 reprotrail provenance.

Parameters:
  • command (list[str])

  • log (str | Path)

  • repos (list[str] | None)

  • allow_dirty (bool)

  • allow_editable (bool)

  • allow_partial_metadata (bool)

  • provenance_json (str | Path | None)

  • product_output (str | Path | None)

  • settings (ReprotrailSettings | None)

Return type:

dict[str, Any]

Dependency snapshot contracts and product environment audits.

reprotrail.epochs.build_dependency_snapshot(*, project_root, lockfile=None, pixi_environment=None, package_versions_payload=None, runtime_packages_payload=None, dependency_records=None, software_states=None, package_names=('reprotrail',))[source]

Build a stable dependency/runtime snapshot.

Parameters:
  • project_root (Path)

  • lockfile (Path | None)

  • pixi_environment (str | None)

  • package_versions_payload (dict[str, str] | None)

  • runtime_packages_payload (list[dict[str, Any]] | None)

  • dependency_records (list[dict[str, Any]] | None)

  • software_states (list[dict[str, Any]] | None)

  • package_names (tuple[str, ...])

Return type:

dict[str, Any]

reprotrail.epochs.check_dependency_contract(*, run_root, project_root, acceptance_reason=None, dry_run=False, pixi_environment=None, snapshot=None, package_names=('reprotrail',))[source]

Check or accept the dependency epoch for a run root.

Parameters:
  • run_root (Path)

  • project_root (Path)

  • acceptance_reason (str | None)

  • dry_run (bool)

  • pixi_environment (str | None)

  • snapshot (dict[str, Any] | None)

  • package_names (tuple[str, ...])

Return type:

dict[str, Any]

reprotrail.epochs.annotate_product_environment_consistency(provenance_path, *, run_root, output_snapshot, output_epoch, append_readme=False)[source]

Annotate a product with input/output dependency-epoch consistency.

Parameters:
  • provenance_path (Path | None)

  • run_root (Path | None)

  • output_snapshot (dict[str, Any])

  • output_epoch (dict[str, Any] | None)

  • append_readme (bool)

Return type:

dict[str, Any] | None

reprotrail.epochs.audit_dependency_epochs(*, run_root, output, product_provenance=None, product_root_markers=())[source]

Scan product provenance files and summarize accepted dependency epochs.

Parameters:
  • run_root (Path)

  • output (Path)

  • product_provenance (list[Path] | None)

  • product_root_markers (tuple[str, ...])

Return type:

dict[str, Any]

Product sidecar, checksum, and pointer-attribute helpers.

class reprotrail.products.ProductSidecars(data, package, stem, readme, license, ro_crate, provenance, provenance_sha256)[source]

Paths that travel with one durable data product.

Parameters:
  • data (Path)

  • package (Path)

  • stem (str)

  • readme (Path)

  • license (Path)

  • ro_crate (Path)

  • provenance (Path)

  • provenance_sha256 (Path)

reprotrail.products.product_sidecars(data_path)[source]

Return sidecar paths for a durable data product.

Parameters:

data_path (str | Path)

Return type:

ProductSidecars

reprotrail.products.product_record(data_path, *, provenance_path=None, metadata=None)[source]

Build generic product metadata for a provenance record.

Parameters:
  • data_path (str | Path)

  • provenance_path (str | Path | None)

  • metadata (Mapping[str, Any] | None)

Return type:

dict[str, Any]

reprotrail.products.public_license(license_payload)[source]

Validate and normalize required product license metadata.

Parameters:

license_payload (str | Mapping[str, Any] | None)

Return type:

dict[str, str]

reprotrail.products.default_readme_template_text()[source]

Return the bundled product README template.

Return type:

str

reprotrail.products.copy_readme_template(output, *, force=False)[source]

Copy the bundled product README template for project customization.

Parameters:
  • output (str | Path)

  • force (bool)

Return type:

Path

reprotrail.products.write_json_with_provenance(path, payload, *, provenance=None)[source]

Write JSON metadata, embedding public provenance when supplied.

Parameters:
  • path (str | Path)

  • payload (dict[str, Any])

  • provenance (dict[str, Any] | None)

Return type:

None

reprotrail.products.stamp_dataset_provenance(obj, provenance)[source]

Stamp lightweight provenance pointer attrs on an xarray-like object.

Parameters:
  • obj (Any)

  • provenance (dict[str, Any] | None)

Return type:

Any

reprotrail.products.finalize_product_provenance(provenance_path, *, project_root=None, pixi_environment=None, product_metadata_file='reprotrail.products.toml', license=None, allow_partial_metadata=False, stamp=True)[source]

Finalize a product sidecar checksum and lightweight product attrs.

Parameters:
  • provenance_path (str | Path)

  • project_root (str | Path | None)

  • pixi_environment (str | None)

  • product_metadata_file (str)

  • license (str | Mapping[str, Any] | None)

  • allow_partial_metadata (bool)

  • stamp (bool)

Return type:

str | None

Product license, attribution, and packaging metadata helpers.

exception reprotrail.product_metadata.ProductMetadataError[source]

Raised when product metadata cannot be resolved safely.

exception reprotrail.product_metadata.ProductMetadataDependencyError[source]

Raised when optional product metadata dependencies are unavailable.

class reprotrail.product_metadata.ProductInput(name=None, producer=None, path=None, license=None, url=None, marginal=False)[source]

Attribution and license metadata for one product input.

Parameters:
  • name (str | None)

  • producer (str | None)

  • path (str | None)

  • license (str | None)

  • url (str | None)

  • marginal (bool)

class reprotrail.product_metadata.SoftwareLicenseOverride(name, kind='package', license=None, url=None)[source]

Manual license metadata for one software package or repository.

Parameters:
  • name (str)

  • kind (str)

  • license (str | None)

  • url (str | None)

class reprotrail.product_metadata.ProductMetadata(output, license=None, readme_template=None, inputs=(), software=(), source=None)[source]

Metadata selected for one product output.

Parameters:
reprotrail.product_metadata.load_product_index(project_root, *, metadata_file='reprotrail.products.toml')[source]

Load project-root product metadata entries if the index exists.

Parameters:
  • project_root (str | Path)

  • metadata_file (str)

Return type:

tuple[ProductMetadata, …]

reprotrail.product_metadata.product_match_value(path, project_root)[source]

Return the project-relative path string used for product metadata matching.

Parameters:
  • path (str | Path)

  • project_root (str | Path)

Return type:

str

reprotrail.product_metadata.match_product_metadata(product_output, project_root, *, metadata_file='reprotrail.products.toml')[source]

Return the single product metadata entry matching one product output.

Parameters:
  • product_output (str | Path)

  • project_root (str | Path)

  • metadata_file (str)

Return type:

ProductMetadata | None

reprotrail.product_metadata.require_product_metadata_tools()[source]

Fail clearly when product license/RO-Crate dependencies are unavailable.

Return type:

None

reprotrail.product_metadata.normalize_spdx_expression(expression)[source]

Validate and normalize an SPDX expression using the SPDX toolchain.

Parameters:

expression (str)

Return type:

str

reprotrail.product_metadata.spdx_expression_symbols(expression)[source]

Return SPDX license symbols from a validated expression.

Parameters:

expression (str)

Return type:

list[str]

reprotrail.product_metadata.product_license_summary(value)[source]

Return the short public license summary stored in provenance.

Parameters:

value (str | Mapping[str, Any] | None)

Return type:

dict[str, str] | None

reprotrail.product_metadata.input_license_records(inputs, *, project_root, package_dir)[source]

Build attribution/license evidence records for product-index inputs.

Parameters:
  • inputs (tuple[ProductInput, ...])

  • project_root (str | Path)

  • package_dir (str | Path)

Return type:

list[dict[str, Any]]

reprotrail.product_metadata.software_license_records(*, project_root, pixi_environment, overrides)[source]

Build software/dependency license evidence records.

Parameters:
Return type:

tuple[list[dict[str, Any]], list[str]]

Reproduce a recorded provenance step in a fresh workspace.

exception reprotrail.reproduce.ReproductionError[source]

Raised when a reproduction setup cannot proceed.

reprotrail.reproduce.reproduce_from_provenance(*, provenance, workspace, execute=False, strict=False, env=None, project_repo=None, repo_sources=None, input_maps=None, resume=False, force=False, install=True)[source]

Create a reproduction workspace from a product provenance sidecar.

Parameters:
  • provenance (str | Path)

  • workspace (str | Path)

  • execute (bool)

  • strict (bool)

  • env (str | None)

  • project_repo (str | None)

  • repo_sources (dict[str, str] | None)

  • input_maps (dict[str, str] | None)

  • resume (bool)

  • force (bool)

  • install (bool)

Return type:

dict[str, Any]

Command-line interface for reprotrail.