Skip to content

API Design

This document describes the public Python SDK surface for artifact registration and materialization, along with the contracts and invariants callers can rely on.

This page intentionally prioritizes What/Why (semantics and rationale) over How (internal mechanics). For internal flows, see:

Goals

The SDK aims to provide a small, stable surface that can scale from “single process cache” to “cluster-wide durable artifact distribution” without forcing applications to rewrite I/O paths.

Why the API is shaped the way it is:

  • Artifact handles over ad-hoc getters: An Artifact is a stable unit of identity + metadata + fallbacks; it allows the SDK to evolve internal materialization strategies without breaking callers.
  • Explicit policy: StorePolicy is the single durability and placement declaration, so callers can reason about “where the bytes must end up” rather than “which RPC to call”.
  • Daemon-owned data plane: materialization and disk reads remain daemon-owned so that transport locks, verification, and P2P orchestration stay consistent.

Core Concepts

These terms show up across docs and APIs:

  • Artifact: a logical collection of named tensors with a canonical index describing dtypes/shapes/strides and a canonical byte layout.
  • Artifact ID: a content-addressed identifier (e.g. mi2:...) returned by registration, used for retrieval and persistence.
  • Key: a human-friendly string name mapped to an artifact ID (key mapping lives outside the caller process).
  • Replica: a concrete materialization of an artifact on a specific device (local VRAM/DRAM, remote VRAM/DRAM, disk-backed, etc.).
  • Lease / LIP (Lease-In-Place): a daemon-tracked lifetime for a replica that is backed by client-owned VRAM; the daemon exports it via CUDA IPC handles.
  • Fallback: a caller-provided hint for preferred source selection (local, p2p, disk) and optional disk path hints.
  • Plan: a programmable orchestration IR for control-plane actions targeting workers (daemon_id) and instances (instance_id), executed via node agents.

Entry Points

TensorCast exposes a process-wide store session and module-level helpers.

  • tensorcast.init(...): establishes a runtime (connect to an existing daemon or launch services). In mode="auto", concurrent callers under the same runtime root coordinate so one process launches and the rest connect to the same daemon. In mode="create", you can also start a local Global Store by setting global_store_mode="start"; this requires that no healthy local Global Store is already recorded under the current runtime root, otherwise startup fails. Use global_store_config_path=... (or $TENSORCAST_GLOBAL_STORE_CONFIG) to pick the Global Store YAML. Implementation: tensorcast/startup.py.
  • tensorcast.store(...): returns the process-wide Store (lazy initialization).
  • tensorcast.plan(ctx): builds a programmable plan that binds a single CallContext to all step executions.
  • Module-level helpers (tensorcast.register, tensorcast.put, tensorcast.register_view, tensorcast.artifact, tensorcast.from_disk) are thin wrappers around the process Store.

Advanced usage: directly construct a store instance if you want multiple stores in one process (uncommon).

  • tensorcast.api.store.Store(daemon_endpoint, opts=StoreOptions)

StoreOptions (process/store configuration)

StoreOptions configures client-side behavior such as default fallbacks and retry policy overrides.

Supported fields:

Field Type Default What it does Why it exists
get GetArtifactOptions \| None None Default execution-scoped retrieval options applied by the store runtime. Provide consistent source/topology defaults for a whole process.
retry_overrides Mapping[str, RetryPolicy] \| None None Override the built-in retry policies per verb (keys: register, put, get, get_into). Tune latency vs. resilience for different deployments.

Example:

import tensorcast
from tensorcast.api.store.types import RetryPolicy

tensorcast.init(mode="connect", address="127.0.0.1:50051")
store = tensorcast.store(
    opts=tensorcast.StoreOptions(
        get=tensorcast.GetArtifactOptions(source="local_only"),
        retry_overrides={"get": RetryPolicy(20.0, 2, 0.1, 2.0, 0.5)},
    )
)

Store And Entry Points

TensorCast exposes a process-wide Store session and module-level helpers that bind to it.

This section is kept for compatibility with older links; see Entry Points.

Registration And Upload APIs

Primary registration verbs:

  • Store.register(...) registers existing GPU memory using lease-in-place (LIP) when supported.
  • Store.put(...) uploads tensors into daemon-owned stable DRAM.
  • Store.register_view(...) registers a slice or transpose view and allows the daemon to rebuild the canonical artifact from a partial upload.

Why there are multiple verbs:

  • register is the fastest path when tensors already live in GPU memory and you want to export them without a deep copy; it requires careful lifetime management (leases/regions).
  • put trades some overhead for stability by uploading into daemon-owned stable memory (less coupling to client process lifetime).
  • register_view is for structural reuse (views/slices/transposes) and can reduce redundant data movement when the canonical artifact can be derived from a view payload.

register and put accept an optional policy and an optional options:

  • policy: StorePolicy | str | None is the preferred surface.
  • options.policy is an advanced escape hatch.
  • If both are provided they must normalize to the same policy.

register_view does not currently expose a first-class policy= parameter; use options=RegisterArtifactOptions(policy=...) when you need durability/placement control for view registrations.

RegisterArtifactOptions (advanced registration tuning)

RegisterArtifactOptions controls plan selection and payload sizing. It is an advanced surface: most applications should start with policy= only.

  • Type: tensorcast/api/_config.py (RegisterArtifactOptions, PlanType)
  • Where to pass: Store.register(..., options=...), Store.put(..., options=...), Store.register_view(..., options=...)

Key fields (selected; see source for full list):

Field Default What it does Typical use
plan DRAM_STABLE Selects daemon registration plan (coalesced, lease, stable_dram, etc.). Force a plan when debugging performance or compatibility.
max_inflight_bytes 512 MiB Upper-bounds coalesced upload inflight bytes. Prevent large registrations from monopolizing pinned buffers.
lease_in_place False Opt-in to LIP flows (client-owned VRAM exported via lease). Register already-loaded model weights without copying.
min_tensor_bytes 64 KiB LIP segmentation threshold. Reduce per-tensor overhead for tiny tensors.
max_tensor_count 8192 Safety cap for pathological tensor dicts. Guard against high fan-out metadata.
stage_on_gpu True For stable DRAM plans, stage uploads via GPU buffers. Improve throughput when GPU→DRAM path is faster.
disk_path None Optional local disk path to validate the canonical index during registration. Sanity-check on-disk artifacts; not used for fallback.

Example (LIP opt-in):

import tensorcast

artifact = tensorcast.register(
    {"w": some_cuda_tensor},
    options=tensorcast.RegisterArtifactOptions(lease_in_place=True),
    policy="cache",
)

Store.register (register tensors already in VRAM)

Signature (module-level is identical): tensorcast.register(tensors, *, artifact_id=None, key=None, policy=None, options=None, ttl_ms=None)

Parameter What it means Why/when to use it
tensors Mapping[str, torch.Tensor] of CUDA tensors to register. The logical tensor dict that forms the artifact.
artifact_id Optional client-provided identifier (not the content-addressed id). Useful for diagnostics/idempotency tagging; the daemon still returns the canonical mi2:... id.
key Optional human-friendly name to publish for later lookup. Use when you want consumers to fetch by name rather than by artifact_id.
policy StorePolicy \| str \| None. Declare durability/placement; see Policy & Persistence.
options RegisterArtifactOptions \| None. Advanced tuning (plan selection, inflight sizing).
ttl_ms Optional lease TTL override (ms). Mainly relevant for lease/LIP flows.

Typical scenarios:

  • “I already have model weights on GPU, export them without copying”: use register (often with region-backed LIP).
  • “I want fast local caching but don’t need durability”: policy="cache".
  • “I need durability after registration”: policy="durable" or policy="ha" (starts persistence).

Key mapping (why key exists and how to use it)

key is a human-friendly name that resolves to an artifact_id. Use it when:

  • producers and consumers are decoupled (different processes/nodes)
  • you want a stable name (“model:v7”) but don’t want to pass around content ids

Key mapping is resolved by the daemon so SDK clients do not need direct Global Store knowledge:

  • Resolve: ResolveKeyMapping RPC (daemon → mapping store)
  • Publish: PublishReplicaKey RPC during/after registration

Contracts:

  • A key maps to at most one active artifact id at a time.
  • Publishing a key that already maps to a different artifact id fails with FAILED_PRECONDITION.
  • Prefer artifact_id for fully deterministic reads; use key when you want indirection/versioning.

See:

Store.put (upload into daemon-owned stable DRAM)

Signature: tensorcast.put(tensors, *, artifact_id=None, key=None, policy=None, options=None, device=None)

put uploads tensors and commits a stable DRAM-backed replica. This reduces coupling to client process lifetime at the cost of an upload.

Key parameter:

  • device: optional target device selection for upload planning (e.g. pin to cuda:1).

Store.register_view (register a view-derived artifact)

Signature: tensorcast.register_view(tensors, *, artifact_id=None, key=None, slices=None, transpose=None, view_id=None, placement=None, ttl_ms=None, options=None, canonical_index_bytes=None, registration_kind=None)

register_view is for cases where the canonical artifact can be derived from a view/slice/transpose of an existing tensor dict.

View inputs:

  • slices: mapping of tensor name → a single narrow slice spec.
  • Supported forms: a slice (defaults to dim=0), or (dim, slice).
  • Only one narrow op per tensor; slice.step must be 1.
  • transpose: mapping of tensor name → non-empty sequence of (dim0, dim1) swaps.
  • view_id: optional deterministic identity. When omitted, the daemon computes it from the canonical index bytes + view spec; when provided, it must match the view spec. Do not supply view_id alongside conflicting slices/transpose.
  • placement: "SERVER" or "CLIENT".
  • Defaults: registration chooses "CLIENT" when transpose is present; otherwise "SERVER".
  • If the daemon rejects "SERVER" placement for a view, the SDK surfaces a FAILED_PRECONDITION with guidance to retry "CLIENT".
  • registration_kind: "canonical" (default) or "piece". Piece registration is selection-only, rejects transpose, requires server placement, and must be partial coverage (full canonical coverage should use "canonical").
  • canonical_index_bytes: optional bootstrap path for new assemblies; required to register the first piece without prior Global Store state.

Example:

import tensorcast

reg = tensorcast.register_view(
    {"w": w, "proj": proj},
    slices={"w": [(0, slice(0, 1024))]},
    transpose={"proj": [(0, 1)]},
    placement="CLIENT",
    options=tensorcast.RegisterArtifactOptions(policy="cache"),
)

Store.register_piece (register a dense view piece)

Signature: tensorcast.register_piece(tensors, *, assembly_id, key=None, slices=None, canonical_index_bytes=None, placement=None, ttl_ms=None, options=None)

register_piece uploads dense view bytes under an assembly id (cgid:) and records canonical coverage ranges. Pieces are selection-only (narrow only), reject transpose, require server placement, and must not fully cover the canonical byte space (use register_view/registration_kind="canonical" for full coverage). Provide canonical_index_bytes to bootstrap the first piece when the assembly does not exist yet.

Store.seal_assembly (seal an assembly to MI2)

Signature: tensorcast.seal_assembly(assembly_id, *, publish_canonical=True, timeout_s=120.0)

seal_assembly assembles the canonical byte stream from pieces, computes the MI2 data hash, persists the assembly → MI2 binding, and optionally publishes a canonical replica for durability.

Artifact Handles And Materialization

Retrieval is centered on Artifact handles, not on Store.get or Store.get_into.

  • tensorcast.artifact(...) and Store.artifact(...) return an Artifact handle that exposes metadata and materialization helpers.
  • Artifact.tensor_dict(...) and Artifact.tensor_dict_into(...) provide the primary read surface.
  • tensorcast.from_disk(path) resolves an artifact id and canonical index from a disk directory (explicit import) and seeds the metadata cache. It does not attach retrieval policy to the handle; later reads still use GetArtifactOptions.

Why handles?

Handles separate identity (artifact id/key) from execution (materialize local/P2P/disk, batch, prefetch, verify). This makes it possible to:

  • add new materialization sources without changing call sites
  • keep retrieval policy execution-scoped on GetArtifactOptions
  • preserve cached canonical index metadata across calls

Retrieval Selection

Materialization behavior is controlled by GetArtifactOptions:

  • source accepts a structured RetrievalPolicy or preset sugar such as auto, local_only, disk_first, or disk_only.
  • replica_uuid hints the daemon to reuse a prefetched replica.
  • verify_checksums controls descriptor validation on disk reads.
  • execution_topology carries collective/source-sharing hints separately from retrieval policy.

Disk paths are not supplied by the SDK; when disk fallback is enabled the daemon resolves managed disk locations via Global Store.

GetArtifactOptions is the execution-only contract for materialization, including source selection, topology, region-backed get_into, pinned allocation timeouts, and “wait for completion”.

Examples:

import tensorcast

# Import metadata from disk (explicit import).
handle = tensorcast.from_disk("/shared/tensorcast/models/model_a")
weights = handle.tensor_dict(device="cuda:0")  # requires managed disk locations or existing replicas

# Local-only reads (no P2P, no disk fallback).
handle = tensorcast.artifact(artifact_id="mi2:...")
weights = handle.tensor_dict(
    device="cuda:0",
    options=tensorcast.GetArtifactOptions(source="local_only"),
)

# Disk-first reads.
weights = handle.tensor_dict(
    device="cuda:0",
    options=tensorcast.GetArtifactOptions(source="disk_first"),
)

Store.artifact / from_disk (build handles)

Signature: tensorcast.artifact(*, artifact_id=None, key=None)

Use artifact(...) to build a reusable handle that carries identity and metadata only. You typically provide exactly one of:

  • artifact_id: content-addressed id (preferred)
  • key: mapped to an artifact id via key mapping
  • Disk paths are not accepted in artifact(...); use from_disk(...) for explicit imports and rely on managed disk locations for disk fallback.

StorePolicy And Persistence Hooks

StorePolicy is the single durability and placement declaration for register and put. It supports:

  • Profiles (cache, durable, ha, cold, warm, pinned).
  • Explicit tiers with must, should, may.
  • overflow_policy and layout overrides.

When policy includes shared disk or remote stable tiers, the SDK triggers StartPersistence after registration and stores the returned persistence_task_id on RegisteredArtifact.

Store.query_persistence_status(...) exposes daemon task state by task id or artifact id.

For the full policy model and examples, see Policy & Persistence.

Region APIs

For region-backed registration and quiesced cleanup:

  • Store.register_region(...) / Store.unregister_region(...) are the daemon region APIs. Store.register_vram_region(...) and Store.unregister_vram_region(...) remain SDK conveniences that lower to the same unified RegisterRegion / UnregisterRegion RPCs.
  • Store.deregister_artifact(...) quiesces and drains active exports, then revokes the lease, performs best-effort Global Store cleanup, and (by default) also deletes any managed shared-disk persistence for that artifact (keep_shared_disk_copy=True to retain disk bytes).

Region-backed APIs are primarily about making LIP safe and reducing CUDA IPC churn by reusing stable region handles. See Region-Backed for request/response fields and lifecycle details.

Contracts And Invariants

  • The SDK owns the process Store and its runtime caches.
  • All public methods raise ArtifactError with structured status codes.
  • Registration and materialization errors map to retryable or non-retryable categories; the SDK applies bounded retries for transient errors.
  • Policy resolution is authoritative on the daemon; the SDK only validates the policy shape.

Code Map