API Design¶
This document describes the public Python SDK surface for artifact registration and materialization, along with the contracts and invariants callers can rely on.
This page intentionally prioritizes What/Why (semantics and rationale) over How (internal mechanics). For internal flows, see:
- Registration Flow
- Materialization Flow
- Policy & Persistence
- Region-Backed
- Error, Retry, Observability
- Artifact Views and Retrieval
Navigation¶
- Goals
- Core Concepts
- Entry Points
- Registration APIs
- Artifact Handles and Retrieval APIs
- Policy and Persistence Hooks
- Region APIs
- Contracts and Invariants
- Code Map
Goals¶
The SDK aims to provide a small, stable surface that can scale from “single process cache” to “cluster-wide durable artifact distribution” without forcing applications to rewrite I/O paths.
Why the API is shaped the way it is:
- Artifact handles over ad-hoc getters: An
Artifactis a stable unit of identity + metadata + fallbacks; it allows the SDK to evolve internal materialization strategies without breaking callers. - Explicit policy:
StorePolicyis the single durability and placement declaration, so callers can reason about “where the bytes must end up” rather than “which RPC to call”. - Daemon-owned data plane: materialization and disk reads remain daemon-owned so that transport locks, verification, and P2P orchestration stay consistent.
Core Concepts¶
These terms show up across docs and APIs:
- Artifact: a logical collection of named tensors with a canonical index describing dtypes/shapes/strides and a canonical byte layout.
- Artifact ID: a content-addressed identifier (e.g.
mi2:...) returned by registration, used for retrieval and persistence. - Key: a human-friendly string name mapped to an artifact ID (key mapping lives outside the caller process).
- Replica: a concrete materialization of an artifact on a specific device (local VRAM/DRAM, remote VRAM/DRAM, disk-backed, etc.).
- Lease / LIP (Lease-In-Place): a daemon-tracked lifetime for a replica that is backed by client-owned VRAM; the daemon exports it via CUDA IPC handles.
- Fallback: a caller-provided hint for preferred source selection (
local,p2p,disk) and optional disk path hints. - Plan: a programmable orchestration IR for control-plane actions targeting
workers (
daemon_id) and instances (instance_id), executed via node agents.
Entry Points¶
TensorCast exposes a process-wide store session and module-level helpers.
tensorcast.init(...): establishes a runtime (connect to an existing daemon or launch services). Inmode="auto", concurrent callers under the same runtime root coordinate so one process launches and the rest connect to the same daemon. Inmode="create", you can also start a local Global Store by settingglobal_store_mode="start"; this requires that no healthy local Global Store is already recorded under the current runtime root, otherwise startup fails. Useglobal_store_config_path=...(or$TENSORCAST_GLOBAL_STORE_CONFIG) to pick the Global Store YAML. Implementation: tensorcast/startup.py.tensorcast.store(...): returns the process-wideStore(lazy initialization).tensorcast.plan(ctx): builds a programmable plan that binds a singleCallContextto all step executions.- Module-level helpers (
tensorcast.register,tensorcast.put,tensorcast.register_view,tensorcast.artifact,tensorcast.from_disk) are thin wrappers around the processStore.
Advanced usage: directly construct a store instance if you want multiple stores in one process (uncommon).
tensorcast.api.store.Store(daemon_endpoint, opts=StoreOptions)
StoreOptions (process/store configuration)¶
StoreOptions configures client-side behavior such as default fallbacks and
retry policy overrides.
- Type: tensorcast/api/store/types.py (
StoreOptions,RetryPolicy) - Where to pass:
tensorcast.store(opts=...)orStore(..., opts=...)
Supported fields:
| Field | Type | Default | What it does | Why it exists |
|---|---|---|---|---|
get |
GetArtifactOptions \| None |
None |
Default execution-scoped retrieval options applied by the store runtime. | Provide consistent source/topology defaults for a whole process. |
retry_overrides |
Mapping[str, RetryPolicy] \| None |
None |
Override the built-in retry policies per verb (keys: register, put, get, get_into). |
Tune latency vs. resilience for different deployments. |
Example:
import tensorcast
from tensorcast.api.store.types import RetryPolicy
tensorcast.init(mode="connect", address="127.0.0.1:50051")
store = tensorcast.store(
opts=tensorcast.StoreOptions(
get=tensorcast.GetArtifactOptions(source="local_only"),
retry_overrides={"get": RetryPolicy(20.0, 2, 0.1, 2.0, 0.5)},
)
)
Store And Entry Points¶
TensorCast exposes a process-wide Store session and module-level helpers that bind to it.
This section is kept for compatibility with older links; see Entry Points.
Registration And Upload APIs¶
Primary registration verbs:
Store.register(...)registers existing GPU memory using lease-in-place (LIP) when supported.Store.put(...)uploads tensors into daemon-owned stable DRAM.Store.register_view(...)registers a slice or transpose view and allows the daemon to rebuild the canonical artifact from a partial upload.
Why there are multiple verbs:
registeris the fastest path when tensors already live in GPU memory and you want to export them without a deep copy; it requires careful lifetime management (leases/regions).puttrades some overhead for stability by uploading into daemon-owned stable memory (less coupling to client process lifetime).register_viewis for structural reuse (views/slices/transposes) and can reduce redundant data movement when the canonical artifact can be derived from a view payload.
register and put accept an optional policy and an optional options:
policy: StorePolicy | str | Noneis the preferred surface.options.policyis an advanced escape hatch.- If both are provided they must normalize to the same policy.
register_view does not currently expose a first-class policy= parameter; use
options=RegisterArtifactOptions(policy=...) when you need durability/placement
control for view registrations.
RegisterArtifactOptions (advanced registration tuning)¶
RegisterArtifactOptions controls plan selection and payload sizing. It is an
advanced surface: most applications should start with policy= only.
- Type: tensorcast/api/_config.py (
RegisterArtifactOptions,PlanType) - Where to pass:
Store.register(..., options=...),Store.put(..., options=...),Store.register_view(..., options=...)
Key fields (selected; see source for full list):
| Field | Default | What it does | Typical use |
|---|---|---|---|
plan |
DRAM_STABLE |
Selects daemon registration plan (coalesced, lease, stable_dram, etc.). |
Force a plan when debugging performance or compatibility. |
max_inflight_bytes |
512 MiB |
Upper-bounds coalesced upload inflight bytes. | Prevent large registrations from monopolizing pinned buffers. |
lease_in_place |
False |
Opt-in to LIP flows (client-owned VRAM exported via lease). | Register already-loaded model weights without copying. |
min_tensor_bytes |
64 KiB |
LIP segmentation threshold. | Reduce per-tensor overhead for tiny tensors. |
max_tensor_count |
8192 |
Safety cap for pathological tensor dicts. | Guard against high fan-out metadata. |
stage_on_gpu |
True |
For stable DRAM plans, stage uploads via GPU buffers. | Improve throughput when GPU→DRAM path is faster. |
disk_path |
None |
Optional local disk path to validate the canonical index during registration. | Sanity-check on-disk artifacts; not used for fallback. |
Example (LIP opt-in):
import tensorcast
artifact = tensorcast.register(
{"w": some_cuda_tensor},
options=tensorcast.RegisterArtifactOptions(lease_in_place=True),
policy="cache",
)
Store.register (register tensors already in VRAM)¶
Signature (module-level is identical): tensorcast.register(tensors, *, artifact_id=None, key=None, policy=None, options=None, ttl_ms=None)
| Parameter | What it means | Why/when to use it |
|---|---|---|
tensors |
Mapping[str, torch.Tensor] of CUDA tensors to register. |
The logical tensor dict that forms the artifact. |
artifact_id |
Optional client-provided identifier (not the content-addressed id). | Useful for diagnostics/idempotency tagging; the daemon still returns the canonical mi2:... id. |
key |
Optional human-friendly name to publish for later lookup. | Use when you want consumers to fetch by name rather than by artifact_id. |
policy |
StorePolicy \| str \| None. |
Declare durability/placement; see Policy & Persistence. |
options |
RegisterArtifactOptions \| None. |
Advanced tuning (plan selection, inflight sizing). |
ttl_ms |
Optional lease TTL override (ms). | Mainly relevant for lease/LIP flows. |
Typical scenarios:
- “I already have model weights on GPU, export them without copying”: use
register(often with region-backed LIP). - “I want fast local caching but don’t need durability”:
policy="cache". - “I need durability after registration”:
policy="durable"orpolicy="ha"(starts persistence).
Key mapping (why key exists and how to use it)¶
key is a human-friendly name that resolves to an artifact_id. Use it when:
- producers and consumers are decoupled (different processes/nodes)
- you want a stable name (“model:v7”) but don’t want to pass around content ids
Key mapping is resolved by the daemon so SDK clients do not need direct Global Store knowledge:
- Resolve:
ResolveKeyMappingRPC (daemon → mapping store) - Publish:
PublishReplicaKeyRPC during/after registration
Contracts:
- A key maps to at most one active artifact id at a time.
- Publishing a key that already maps to a different artifact id fails with
FAILED_PRECONDITION. - Prefer
artifact_idfor fully deterministic reads; usekeywhen you want indirection/versioning.
See:
- Key materialization: Materialization Flow
- Proto: proto/tensorcast/daemon/v2/store_daemon.proto
Store.put (upload into daemon-owned stable DRAM)¶
Signature: tensorcast.put(tensors, *, artifact_id=None, key=None, policy=None, options=None, device=None)
put uploads tensors and commits a stable DRAM-backed replica. This reduces
coupling to client process lifetime at the cost of an upload.
Key parameter:
device: optional target device selection for upload planning (e.g. pin tocuda:1).
Store.register_view (register a view-derived artifact)¶
Signature: tensorcast.register_view(tensors, *, artifact_id=None, key=None, slices=None, transpose=None, view_id=None, placement=None, ttl_ms=None, options=None, canonical_index_bytes=None, registration_kind=None)
register_view is for cases where the canonical artifact can be derived from a
view/slice/transpose of an existing tensor dict.
View inputs:
slices: mapping of tensor name → a single narrow slice spec.- Supported forms: a
slice(defaults todim=0), or(dim, slice). - Only one narrow op per tensor;
slice.stepmust be1. transpose: mapping of tensor name → non-empty sequence of(dim0, dim1)swaps.view_id: optional deterministic identity. When omitted, the daemon computes it from the canonical index bytes + view spec; when provided, it must match the view spec. Do not supplyview_idalongside conflictingslices/transpose.placement:"SERVER"or"CLIENT".- Defaults: registration chooses
"CLIENT"when transpose is present; otherwise"SERVER". - If the daemon rejects
"SERVER"placement for a view, the SDK surfaces aFAILED_PRECONDITIONwith guidance to retry"CLIENT". registration_kind:"canonical"(default) or"piece". Piece registration is selection-only, rejects transpose, requires server placement, and must be partial coverage (full canonical coverage should use"canonical").canonical_index_bytes: optional bootstrap path for new assemblies; required to register the first piece without prior Global Store state.
Example:
import tensorcast
reg = tensorcast.register_view(
{"w": w, "proj": proj},
slices={"w": [(0, slice(0, 1024))]},
transpose={"proj": [(0, 1)]},
placement="CLIENT",
options=tensorcast.RegisterArtifactOptions(policy="cache"),
)
Store.register_piece (register a dense view piece)¶
Signature: tensorcast.register_piece(tensors, *, assembly_id, key=None, slices=None, canonical_index_bytes=None, placement=None, ttl_ms=None, options=None)
register_piece uploads dense view bytes under an assembly id (cgid:) and
records canonical coverage ranges. Pieces are selection-only (narrow only),
reject transpose, require server placement, and must not fully cover the
canonical byte space (use register_view/registration_kind="canonical" for
full coverage). Provide canonical_index_bytes to bootstrap the first piece
when the assembly does not exist yet.
Store.seal_assembly (seal an assembly to MI2)¶
Signature: tensorcast.seal_assembly(assembly_id, *, publish_canonical=True, timeout_s=120.0)
seal_assembly assembles the canonical byte stream from pieces, computes the
MI2 data hash, persists the assembly → MI2 binding, and optionally publishes a
canonical replica for durability.
Artifact Handles And Materialization¶
Retrieval is centered on Artifact handles, not on Store.get or Store.get_into.
tensorcast.artifact(...)andStore.artifact(...)return anArtifacthandle that exposes metadata and materialization helpers.Artifact.tensor_dict(...)andArtifact.tensor_dict_into(...)provide the primary read surface.tensorcast.from_disk(path)resolves an artifact id and canonical index from a disk directory (explicit import) and seeds the metadata cache. It does not attach retrieval policy to the handle; later reads still useGetArtifactOptions.
Why handles?¶
Handles separate identity (artifact id/key) from execution (materialize local/P2P/disk, batch, prefetch, verify). This makes it possible to:
- add new materialization sources without changing call sites
- keep retrieval policy execution-scoped on
GetArtifactOptions - preserve cached canonical index metadata across calls
Retrieval Selection¶
Materialization behavior is controlled by GetArtifactOptions:
sourceaccepts a structuredRetrievalPolicyor preset sugar such asauto,local_only,disk_first, ordisk_only.replica_uuidhints the daemon to reuse a prefetched replica.verify_checksumscontrols descriptor validation on disk reads.execution_topologycarries collective/source-sharing hints separately from retrieval policy.
Disk paths are not supplied by the SDK; when disk fallback is enabled the daemon resolves managed disk locations via Global Store.
GetArtifactOptions is the execution-only contract for materialization,
including source selection, topology, region-backed get_into, pinned
allocation timeouts, and “wait for completion”.
- Type: tensorcast/api/_config.py (
GetArtifactOptions,RegionBackedMode)
Examples:
import tensorcast
# Import metadata from disk (explicit import).
handle = tensorcast.from_disk("/shared/tensorcast/models/model_a")
weights = handle.tensor_dict(device="cuda:0") # requires managed disk locations or existing replicas
# Local-only reads (no P2P, no disk fallback).
handle = tensorcast.artifact(artifact_id="mi2:...")
weights = handle.tensor_dict(
device="cuda:0",
options=tensorcast.GetArtifactOptions(source="local_only"),
)
# Disk-first reads.
weights = handle.tensor_dict(
device="cuda:0",
options=tensorcast.GetArtifactOptions(source="disk_first"),
)
Store.artifact / from_disk (build handles)¶
Signature: tensorcast.artifact(*, artifact_id=None, key=None)
Use artifact(...) to build a reusable handle that carries identity and
metadata only. You typically provide exactly one of:
artifact_id: content-addressed id (preferred)key: mapped to an artifact id via key mapping- Disk paths are not accepted in
artifact(...); usefrom_disk(...)for explicit imports and rely on managed disk locations for disk fallback.
StorePolicy And Persistence Hooks¶
StorePolicy is the single durability and placement declaration for register
and put. It supports:
- Profiles (
cache,durable,ha,cold,warm,pinned). - Explicit tiers with
must,should,may. overflow_policyandlayoutoverrides.
When policy includes shared disk or remote stable tiers, the SDK triggers
StartPersistence after registration and stores the returned
persistence_task_id on RegisteredArtifact.
Store.query_persistence_status(...) exposes daemon task state by task id or
artifact id.
For the full policy model and examples, see Policy & Persistence.
Region APIs¶
For region-backed registration and quiesced cleanup:
Store.register_region(...)/Store.unregister_region(...)are the daemon region APIs.Store.register_vram_region(...)andStore.unregister_vram_region(...)remain SDK conveniences that lower to the same unifiedRegisterRegion/UnregisterRegionRPCs.Store.deregister_artifact(...)quiesces and drains active exports, then revokes the lease, performs best-effort Global Store cleanup, and (by default) also deletes any managed shared-disk persistence for that artifact (keep_shared_disk_copy=Trueto retain disk bytes).
Region-backed APIs are primarily about making LIP safe and reducing CUDA IPC churn by reusing stable region handles. See Region-Backed for request/response fields and lifecycle details.
Contracts And Invariants¶
- The SDK owns the process Store and its runtime caches.
- All public methods raise
ArtifactErrorwith structured status codes. - Registration and materialization errors map to retryable or non-retryable categories; the SDK applies bounded retries for transient errors.
- Policy resolution is authoritative on the daemon; the SDK only validates the policy shape.
Code Map¶
- Public store facade: tensorcast/api/store/init.py
- Store runtime: tensorcast/api/store/runtime.py
- Registration pipeline: tensorcast/api/store/registration.py
- Materialization pipeline: tensorcast/api/store/materialization.py
- Policy + options model: tensorcast/api/_config.py