API Reference¶
Capture portable provenance metadata for data-processing outputs.
- class reprotrail.provenance.GitState(repo_root, commit, branch, remote_url, dirty, dirty_marker, status_short, diff_hash=None)[source]¶
Snapshot of a Git repository at one point in time.
- Parameters:
repo_root (Path)
commit (str | None)
branch (str | None)
remote_url (str | None)
dirty (bool)
dirty_marker (str)
status_short (str)
diff_hash (str | None)
- class reprotrail.provenance.InputPathState(path, exists, kind, backend, metadata, git_state=None, git_path=None, git_status='', error=None)[source]¶
Snapshot of an input path and metadata that identifies it.
- Parameters:
path (Path)
exists (bool)
kind (str)
backend (Literal['git-lfs', 'dvc', 'git', 'filesystem', 'unknown'])
metadata (dict[str, Any])
git_state (GitState | None)
git_path (str | None)
git_status (str)
error (str | None)
- reprotrail.provenance.canonicalize_remote_url(remote_url)[source]¶
Return a portable, reader-friendly form of a Git remote URL.
- Parameters:
remote_url (str | None)
- Return type:
str | None
- reprotrail.provenance.run_git(args, cwd)[source]¶
Run Git and return
(ok, stdout, error)without raising.- Parameters:
args (Sequence[str])
cwd (Path | str)
- Return type:
tuple[bool, str, str]
- reprotrail.provenance.discover_repo_root(repo_dir=None, *, max_parent_levels=3)[source]¶
Find the Git repository root for a directory or one of its parents.
- Parameters:
repo_dir (Path | str | None)
max_parent_levels (int)
- Return type:
Path | None
- reprotrail.provenance.get_git_state(repo_dir='.', *, remote='origin', include_diff_hash=True)[source]¶
Capture the current Git state for a repository.
- Parameters:
repo_dir (Path | str)
remote (str)
include_diff_hash (bool)
- Return type:
- reprotrail.provenance.public_git_state(state)[source]¶
Return a portable Git state record suitable for public metadata.
- Parameters:
state (GitState | Mapping[str, Any])
- Return type:
dict[str, Any]
- reprotrail.provenance.summarize_directory(path, *, max_entries=20000)[source]¶
Summarize a directory without embedding a full file listing.
- Parameters:
path (Path | str)
max_entries (int)
- Return type:
dict[str, Any]
- reprotrail.provenance.get_input_path_state(path)[source]¶
Inspect one input path and classify its provenance backend.
- Parameters:
path (Path | str)
- Return type:
- reprotrail.provenance.get_input_path_states(paths)[source]¶
Inspect multiple input paths, preserving input order.
- Parameters:
paths (Iterable[Path | str])
- Return type:
list[InputPathState]
- reprotrail.provenance.public_input_path_state(state)[source]¶
Return a compact input path record without local-only repository roots.
- Parameters:
state (InputPathState | Mapping[str, Any])
- Return type:
dict[str, Any]
- reprotrail.provenance.public_provenance(value)[source]¶
Return provenance metadata intended to be written into public outputs.
- Parameters:
value (Any)
- Return type:
Any
- reprotrail.provenance.clean_command_parts(parts)[source]¶
Remove reprotrail/provenance sidecar flags from recorded commands.
- Parameters:
parts (Sequence[str])
- Return type:
list[str]
- reprotrail.provenance.build_cf_history_entry(command=None, *, git_state=None, git_states=(), input_states=(), timestamp=None, include_inputs=False)[source]¶
Build a timestamped history line suitable for CF/xarray attrs.
- Parameters:
command (str | Sequence[str] | None)
git_state (GitState | Mapping[str, Any] | None)
git_states (Sequence[GitState | Mapping[str, Any]])
input_states (Sequence[InputPathState | Mapping[str, Any]])
timestamp (datetime | None)
include_inputs (bool)
- Return type:
str
- reprotrail.provenance.append_cf_history(existing, entry)[source]¶
Prepend a new entry to existing CF history text.
- Parameters:
existing (str | None)
entry (str)
- Return type:
str
- reprotrail.provenance.append_xarray_history(obj, entry, *, copy=False)[source]¶
Prepend a history entry to an xarray-like object’s
attrs.- Parameters:
obj (Any)
entry (str)
copy (bool)
- Return type:
Any
- reprotrail.provenance.enforce_clean_repos(repos, *, allow_dirty=False, missing_ok=True)[source]¶
Validate that repositories are clean unless dirty state is allowed.
- Parameters:
repos (Iterable[Path | str])
allow_dirty (bool)
missing_ok (bool)
- Return type:
list[GitState]
- reprotrail.provenance.env_allows_dirty(var='REPROTRAIL_ALLOW_DIRTY')[source]¶
Return whether an environment variable opts into dirty repositories.
- Parameters:
var (str)
- Return type:
bool
Pixi environment and editable dependency helpers.
- exception reprotrail.pixi.PixiGitFreshnessError[source]¶
Raised when Pixi Git freshness cannot be checked safely.
- reprotrail.pixi.pixi_environment_block(lock_text, environment)[source]¶
Return the environment block from a Pixi lockfile.
- Parameters:
lock_text (str)
environment (str | None)
- Return type:
str
- reprotrail.pixi.is_local_pixi_ref(value)[source]¶
Return whether a Pixi pypi reference points at a local path.
- Parameters:
value (str)
- Return type:
bool
- reprotrail.pixi.pixi_local_path_dependencies(lock_text, environment)[source]¶
List local path dependencies in one Pixi environment.
- Parameters:
lock_text (str)
environment (str | None)
- Return type:
list[str]
- reprotrail.pixi.pixi_package_names_by_pypi(lock_text)[source]¶
Return package names keyed by Pixi pypi source reference.
- Parameters:
lock_text (str)
- Return type:
dict[str, str]
- reprotrail.pixi.pixi_dependency_records(lock_text, environment, project_root)[source]¶
Classify local Pixi dependencies as project-self or external-editable.
- Parameters:
lock_text (str)
environment (str | None)
project_root (Path)
- Return type:
list[dict[str, Any]]
- reprotrail.pixi.public_dependency_records(records)[source]¶
Remove private resolved-path fields from dependency records.
- Parameters:
records (list[dict[str, Any]])
- Return type:
list[dict[str, Any]]
- reprotrail.pixi.editable_dependency_failures(dependency_records, *, allow_editable)[source]¶
Return policy failures for external editable/path dependencies.
- Parameters:
dependency_records (list[dict[str, Any]])
allow_editable (bool)
- Return type:
list[str]
- reprotrail.pixi.repo_paths_with_dependencies(repos, dependency_records)[source]¶
Append resolved editable dependency Git repos to an inspected repo list.
- Parameters:
repos (list[str])
dependency_records (list[dict[str, Any]])
- Return type:
list[str]
- reprotrail.pixi.normalize_package_name(name)[source]¶
Return a normalized Python distribution name.
- Parameters:
name (str)
- Return type:
str
- reprotrail.pixi.package_records(package_names)[source]¶
Return installed package records with sanitized source identity.
- Parameters:
package_names (list[str] | tuple[str, ...])
- Return type:
list[dict[str, Any]]
- reprotrail.pixi.package_versions(package_names)[source]¶
Return installed versions for package names that can be resolved.
- Parameters:
package_names (list[str] | tuple[str, ...])
- Return type:
dict[str, str]
- reprotrail.pixi.pixi_package_license_records(project_root, pixi_environment)[source]¶
Return package license metadata from the local Pixi environment.
- Parameters:
project_root (Path)
pixi_environment (str | None)
- Return type:
list[dict[str, Any]]
- reprotrail.pixi.check_pixi_git_freshness(project_root, environment, packages, *, manifest_path=None)[source]¶
Check whether selected Git-backed Pixi packages would move on update.
- Parameters:
project_root (Path)
environment (str)
packages (tuple[str, ...])
manifest_path (Path | None)
- Return type:
dict[str, Any]
- reprotrail.pixi.infer_pixi_environment(project_root, value=None)[source]¶
Infer the active Pixi environment from an explicit value, env var, or Python.
- Parameters:
project_root (Path)
value (str | None)
- Return type:
str | None
- reprotrail.pixi.environment_summary(*, project_root, pixi_environment, dependency_records, allow_editable, package_names, env_var_whitelist)[source]¶
Build a portable summary of the active runtime environment.
- Parameters:
project_root (Path)
pixi_environment (str | None)
dependency_records (list[dict[str, Any]])
allow_editable (bool)
package_names (tuple[str, ...])
env_var_whitelist (tuple[str, ...])
- Return type:
dict[str, Any]
- reprotrail.pixi.write_environment_bundle(*, run_root, project_root, lockfile, pixi_environment, dependency_records, allow_editable, package_names, env_var_whitelist)[source]¶
Copy a Pixi lockfile and environment summary into provenance artifacts.
- Parameters:
run_root (Path)
project_root (Path)
lockfile (Path)
pixi_environment (str | None)
dependency_records (list[dict[str, Any]])
allow_editable (bool)
package_names (tuple[str, ...])
env_var_whitelist (tuple[str, ...])
- Return type:
dict[str, Any]
Command runner that records provenance and runtime status.
- reprotrail.runner.diagnostic_software_states(*, candidates, trusted_states, log)[source]¶
Return configured repo states that are not trusted runtime repos.
- Parameters:
candidates (list[str])
trusted_states (list[dict[str, Any]])
log (Path)
- Return type:
list[dict[str, Any]]
- reprotrail.runner.run_with_provenance(*, command, log, repos=None, allow_dirty=False, allow_editable=False, allow_partial_metadata=False, provenance_json=None, product_output=None, settings=None)[source]¶
Run a command while recording v1 reprotrail provenance.
- Parameters:
command (list[str])
log (str | Path)
repos (list[str] | None)
allow_dirty (bool)
allow_editable (bool)
allow_partial_metadata (bool)
provenance_json (str | Path | None)
product_output (str | Path | None)
settings (ReprotrailSettings | None)
- Return type:
dict[str, Any]
Dependency snapshot contracts and product environment audits.
- reprotrail.epochs.build_dependency_snapshot(*, project_root, lockfile=None, pixi_environment=None, package_versions_payload=None, runtime_packages_payload=None, dependency_records=None, software_states=None, package_names=('reprotrail',))[source]¶
Build a stable dependency/runtime snapshot.
- Parameters:
project_root (Path)
lockfile (Path | None)
pixi_environment (str | None)
package_versions_payload (dict[str, str] | None)
runtime_packages_payload (list[dict[str, Any]] | None)
dependency_records (list[dict[str, Any]] | None)
software_states (list[dict[str, Any]] | None)
package_names (tuple[str, ...])
- Return type:
dict[str, Any]
- reprotrail.epochs.check_dependency_contract(*, run_root, project_root, acceptance_reason=None, dry_run=False, pixi_environment=None, snapshot=None, package_names=('reprotrail',))[source]¶
Check or accept the dependency epoch for a run root.
- Parameters:
run_root (Path)
project_root (Path)
acceptance_reason (str | None)
dry_run (bool)
pixi_environment (str | None)
snapshot (dict[str, Any] | None)
package_names (tuple[str, ...])
- Return type:
dict[str, Any]
- reprotrail.epochs.annotate_product_environment_consistency(provenance_path, *, run_root, output_snapshot, output_epoch, append_readme=False)[source]¶
Annotate a product with input/output dependency-epoch consistency.
- Parameters:
provenance_path (Path | None)
run_root (Path | None)
output_snapshot (dict[str, Any])
output_epoch (dict[str, Any] | None)
append_readme (bool)
- Return type:
dict[str, Any] | None
- reprotrail.epochs.audit_dependency_epochs(*, run_root, output, product_provenance=None, product_root_markers=())[source]¶
Scan product provenance files and summarize accepted dependency epochs.
- Parameters:
run_root (Path)
output (Path)
product_provenance (list[Path] | None)
product_root_markers (tuple[str, ...])
- Return type:
dict[str, Any]
Product sidecar, checksum, and pointer-attribute helpers.
- class reprotrail.products.ProductSidecars(data, package, stem, readme, license, ro_crate, provenance, provenance_sha256)[source]¶
Paths that travel with one durable data product.
- Parameters:
data (Path)
package (Path)
stem (str)
readme (Path)
license (Path)
ro_crate (Path)
provenance (Path)
provenance_sha256 (Path)
- reprotrail.products.product_sidecars(data_path)[source]¶
Return sidecar paths for a durable data product.
- Parameters:
data_path (str | Path)
- Return type:
- reprotrail.products.product_record(data_path, *, provenance_path=None, metadata=None)[source]¶
Build generic product metadata for a provenance record.
- Parameters:
data_path (str | Path)
provenance_path (str | Path | None)
metadata (Mapping[str, Any] | None)
- Return type:
dict[str, Any]
- reprotrail.products.public_license(license_payload)[source]¶
Validate and normalize required product license metadata.
- Parameters:
license_payload (str | Mapping[str, Any] | None)
- Return type:
dict[str, str]
- reprotrail.products.default_readme_template_text()[source]¶
Return the bundled product README template.
- Return type:
str
- reprotrail.products.copy_readme_template(output, *, force=False)[source]¶
Copy the bundled product README template for project customization.
- Parameters:
output (str | Path)
force (bool)
- Return type:
Path
- reprotrail.products.write_json_with_provenance(path, payload, *, provenance=None)[source]¶
Write JSON metadata, embedding public provenance when supplied.
- Parameters:
path (str | Path)
payload (dict[str, Any])
provenance (dict[str, Any] | None)
- Return type:
None
- reprotrail.products.stamp_dataset_provenance(obj, provenance)[source]¶
Stamp lightweight provenance pointer attrs on an xarray-like object.
- Parameters:
obj (Any)
provenance (dict[str, Any] | None)
- Return type:
Any
- reprotrail.products.finalize_product_provenance(provenance_path, *, project_root=None, pixi_environment=None, product_metadata_file='reprotrail.products.toml', license=None, allow_partial_metadata=False, stamp=True)[source]¶
Finalize a product sidecar checksum and lightweight product attrs.
- Parameters:
provenance_path (str | Path)
project_root (str | Path | None)
pixi_environment (str | None)
product_metadata_file (str)
license (str | Mapping[str, Any] | None)
allow_partial_metadata (bool)
stamp (bool)
- Return type:
str | None
Product license, attribution, and packaging metadata helpers.
- exception reprotrail.product_metadata.ProductMetadataError[source]¶
Raised when product metadata cannot be resolved safely.
- exception reprotrail.product_metadata.ProductMetadataDependencyError[source]¶
Raised when optional product metadata dependencies are unavailable.
- class reprotrail.product_metadata.ProductInput(name=None, producer=None, path=None, license=None, url=None, marginal=False)[source]¶
Attribution and license metadata for one product input.
- Parameters:
name (str | None)
producer (str | None)
path (str | None)
license (str | None)
url (str | None)
marginal (bool)
- class reprotrail.product_metadata.SoftwareLicenseOverride(name, kind='package', license=None, url=None)[source]¶
Manual license metadata for one software package or repository.
- Parameters:
name (str)
kind (str)
license (str | None)
url (str | None)
- class reprotrail.product_metadata.ProductMetadata(output, license=None, readme_template=None, inputs=(), software=(), source=None)[source]¶
Metadata selected for one product output.
- Parameters:
output (str)
license (str | None)
readme_template (str | None)
inputs (tuple[ProductInput, ...])
software (tuple[SoftwareLicenseOverride, ...])
source (Path | None)
- reprotrail.product_metadata.load_product_index(project_root, *, metadata_file='reprotrail.products.toml')[source]¶
Load project-root product metadata entries if the index exists.
- Parameters:
project_root (str | Path)
metadata_file (str)
- Return type:
tuple[ProductMetadata, …]
- reprotrail.product_metadata.product_match_value(path, project_root)[source]¶
Return the project-relative path string used for product metadata matching.
- Parameters:
path (str | Path)
project_root (str | Path)
- Return type:
str
- reprotrail.product_metadata.match_product_metadata(product_output, project_root, *, metadata_file='reprotrail.products.toml')[source]¶
Return the single product metadata entry matching one product output.
- Parameters:
product_output (str | Path)
project_root (str | Path)
metadata_file (str)
- Return type:
ProductMetadata | None
- reprotrail.product_metadata.require_product_metadata_tools()[source]¶
Fail clearly when product license/RO-Crate dependencies are unavailable.
- Return type:
None
- reprotrail.product_metadata.normalize_spdx_expression(expression)[source]¶
Validate and normalize an SPDX expression using the SPDX toolchain.
- Parameters:
expression (str)
- Return type:
str
- reprotrail.product_metadata.spdx_expression_symbols(expression)[source]¶
Return SPDX license symbols from a validated expression.
- Parameters:
expression (str)
- Return type:
list[str]
- reprotrail.product_metadata.product_license_summary(value)[source]¶
Return the short public license summary stored in provenance.
- Parameters:
value (str | Mapping[str, Any] | None)
- Return type:
dict[str, str] | None
- reprotrail.product_metadata.input_license_records(inputs, *, project_root, package_dir)[source]¶
Build attribution/license evidence records for product-index inputs.
- Parameters:
inputs (tuple[ProductInput, ...])
project_root (str | Path)
package_dir (str | Path)
- Return type:
list[dict[str, Any]]
- reprotrail.product_metadata.software_license_records(*, project_root, pixi_environment, overrides)[source]¶
Build software/dependency license evidence records.
- Parameters:
project_root (str | Path)
pixi_environment (str | None)
overrides (tuple[SoftwareLicenseOverride, ...])
- Return type:
tuple[list[dict[str, Any]], list[str]]
Reproduce a recorded provenance step in a fresh workspace.
- exception reprotrail.reproduce.ReproductionError[source]¶
Raised when a reproduction setup cannot proceed.
- reprotrail.reproduce.reproduce_from_provenance(*, provenance, workspace, execute=False, strict=False, env=None, project_repo=None, repo_sources=None, input_maps=None, resume=False, force=False, install=True)[source]¶
Create a reproduction workspace from a product provenance sidecar.
- Parameters:
provenance (str | Path)
workspace (str | Path)
execute (bool)
strict (bool)
env (str | None)
project_repo (str | None)
repo_sources (dict[str, str] | None)
input_maps (dict[str, str] | None)
resume (bool)
force (bool)
install (bool)
- Return type:
dict[str, Any]
Command-line interface for reprotrail.