Registration Flow¶
This document describes internal registration and upload flows implemented by SDK, daemon, and StoreEngine.
Related docs:
- Public surface and caller contracts: API Design
- Policy and what happens after commit: Policy & Persistence
- Region registration and teardown: Region-Backed
- View semantics and piece assembly: Artifact Views and Retrieval
- Failure modes and retry guidance: Error, Retry, Observability
What is “registration”?¶
Registration turns a caller-provided tensor dictionary into a daemon-tracked artifact with:
- a canonical index (names → dtype/shape/stride → canonical byte layout)
- a content-addressed artifact id (
mi2:...) - optionally, an initial replica (stable DRAM) or an exported lease (LIP)
- optionally, a background persistence task (shared disk / remote stable)
Why it is split into a multi-step lifecycle:
- registration can involve large payloads (streaming is required)
- the daemon may need to allocate resources up-front (e.g. coalesced buffers)
- the system needs a clean cancellation/retry boundary (
AbortRegisteredArtifact)
Registration Inputs And Canonicalization¶
- The SDK builds a tensor storage graph that de-duplicates storages and produces tensor aliases. See tensorcast/api/_tensor_graph.py and tensorcast/api/_register.py.
- Canonical index bytes and layout metadata are derived from the storage graph and used to build plans for registration.
Begin, Feed, Commit¶
All registration paths use the same RPC lifecycle (unary begin, streaming feed, unary commit):
BeginRegisterArtifactFeedRegisterArtifactStreamCommitRegisteredArtifact
The plan controls how the daemon interprets the payload and which memory tier is committed.
BeginRegisterArtifactRequest (what/why of each field)¶
Proto: proto/tensorcast/daemon/v2/store_daemon.proto
| Field | What it means | Why it exists |
|---|---|---|
device_id |
Target GPU device ordinal for the registration plan. | Tie allocations/handles to a specific device. |
total_size |
Total canonical bytes to register (aligned). | Allocate/validate buffers and enforce size invariants. |
ttl_ms |
Optional TTL for lease-based lifecycles. | Prevent leaked registrations/leases. |
owner_pid |
Required client PID for lease lifecycle. | Safety: ensure only the owner can keep-alive/revoke. |
client_artifact_id |
Optional client-provided identity. | Debugging / idempotency hooks; daemon remains authoritative. |
index (tensor_index_key or tensor_index_data) |
Canonical index bytes or a hash key referencing them. | Avoid resending large indices when deduplicated by hash. |
plan (coalesced/lease/stable_dram) |
Oneof selecting the realization plan. | Same high-level API, multiple data-plane strategies. |
policy |
StorePolicy declaration. |
The daemon resolves placement/durability at commit time. |
view |
Optional view registration parameters. | Support register_view without a separate RPC surface. |
FeedRegisterArtifactStreamRequest¶
The feed stream carries plan-specific payloads plus optional deduplicated metadata tables.
| Field | Used by | What it does |
|---|---|---|
registration_id |
all plans | Correlates the stream with the begin session. |
lease_segments |
lease/LIP | Streams lease segments (handles + ranges) to build canonical bytes. |
view_chunk |
view registration | Streams view payload chunks into canonical offsets. |
storage_entries |
lease/LIP | Deduplicated storage table for handles/regions. |
tensor_aliases |
lease/LIP | Logical tensor metadata mapping names to storages/offsets. |
The storage_entries + tensor_aliases mechanism is what lets the SDK register
complex tensor dicts without repeating per-tensor CUDA IPC handle metadata.
StorageEntry / TensorAlias (LIP metadata tables)¶
Proto: proto/tensorcast/daemon/v2/store_daemon.proto
StorageEntry describes a backing storage segment (typically a CUDA allocation):
| Field | What it means | Notes |
|---|---|---|
storage_id |
Client-chosen identifier used for deduplication. | Must be unique within the registration stream. |
device_id |
GPU ordinal that owns this storage. | Used for validation and handle resolution. |
cuda_ipc_handle |
Inline CUDA IPC handle for the storage. | Mutually exclusive with vram_region_id. |
vram_region_id |
Reference to a previously registered VRAM region. | Used with mapping_base_offset. |
storage_length |
Length in bytes of the storage. | Bounds checks for aliases/segments. |
mapping_base_offset |
Base offset from the mapped handle to the start of this storage window (bytes). | For cuda_ipc_handle, this is the CUDA allocation offset (sub-allocation safe). For vram_region_id, this is the offset into the region mapping. |
TensorAlias maps logical tensors to storages and offsets:
| Field | What it means |
|---|---|
name |
Logical tensor name. |
storage_id |
Which StorageEntry backs the tensor. |
storage_offset |
Offset into the storage (bytes). |
logical_length |
Logical byte length for this tensor slice. |
shape, stride, dtype |
Tensor metadata used to reconstruct PyTorch tensors. |
LeaseSegments / LeasedSegment (LIP segment streaming)¶
LeasedSegment specifies how to populate the canonical coalesced layout:
| Field | What it means | Why it exists |
|---|---|---|
storage_id |
Reference to a StorageEntry. |
Required: segments never inline CUDA IPC handles. |
storage_offset |
Offset into the referenced storage window (bytes). | Allows slicing a storage window (usually 0). |
artifact_offset |
Destination offset in the canonical artifact layout (bytes). | Defines where the bytes land in the artifact. |
length |
Segment length (bytes). | Must match the referenced storage length for full-storage registrations. |
CommitRegisteredArtifactResponse (caller-visible outcomes)¶
The commit response is the boundary where the artifact becomes addressable:
artifact_descriptorcontains the content-addressed artifact id and related metadata.existed=trueindicates idempotent join of an existing local replica/lease.local_stable_tierreports whether synchronous local stable admission succeeded (see below).- view fields (
view_id,canonical_ranges,registration_kind) apply to view registrations.
Lease In Place Path¶
Store.register uses the LIP plan and streams storage metadata plus lease
segments.
- Storage entries include
storage_id,storage_length, and either a CUDA IPC handle or a region reference. - Tensor aliases map logical tensors to storage entries.
- Lease segments reference storage entries and specify destination offsets.
Region-backed LIP is preferred when a storage is fully covered by a registered
VRAM region. The SDK emits vram_region_id and mapping_base_offset in
StorageEntry and does not send per-storage CUDA handles in that case.
Region Referenced LIP Storage¶
This is the critical “why” behind region-backed registration:
- Per-storage CUDA IPC handles are relatively expensive to create/track.
- Many workloads register multiple artifacts that live inside a few long-lived CUDA allocations (e.g. model weight slabs).
- A region handle lets the daemon refer to stable CUDA IPC metadata once, then use cheap offsets for each storage entry.
See Region-Backed for RegisterRegion(memory_kind=VRAM)
and teardown.
Coalesced And Stable DRAM Paths¶
Store.put commits a stable DRAM replica. The daemon performs a coalesced or
stable DRAM commit and returns the descriptor and canonical hashes.
View Registration¶
Store.register_view attaches a view spec and upload ranges. The daemon
rebuilds the canonical artifact from the view payload and returns canonical
coverage ranges in the commit response.
Local Stable Tier¶
After commit, the daemon resolves StorePolicy and may satisfy the local stable
DRAM tier synchronously:
mustlocal stable failures fail the commit RPC.shouldlocal stable failures return alocal_stable_tierresult withDEGRADEDand a message.maydoes not trigger admission.
Stable DRAM retention and overflow rules are enforced by
StableDramCacheManager in the StoreEngine.
Why this is part of commit:
- local stable admission is a purely local decision (no GS dependency)
- callers often want “ready-to-use locally” semantics (fail fast if
must) - it provides a clean degraded vs failed signal when local memory is contended
Outputs¶
The SDK returns RegisteredArtifact containing:
artifact_idand canonical indexreplicainfo (plan, device, size)leasewhen LIP is usedlocal_stable_tierresult when policy requests local stablepersistence_task_idwhen persistence is started
Registration Sequence¶
sequenceDiagram
participant SDK as SDK Store
participant DM as Daemon
participant SE as StoreEngine
participant GS as GlobalStore
SDK->>DM: BeginRegisterArtifact
SDK->>DM: FeedRegisterArtifactStream
SDK->>DM: CommitRegisteredArtifact
DM->>SE: commit registration plan
opt local stable tier
DM->>SE: admit stable DRAM policy
end
DM-->>SDK: CommitRegisteredArtifactResponse
opt persistence required
SDK->>DM: StartPersistence
DM->>GS: PlanPlacement
end
Code Map¶
- SDK registration: tensorcast/api/store/registration.py
- Storage graph and LIP upload: tensorcast/api/_register.py
- Daemon registration controller: daemon/service/controllers/registration_controller.cc
- Policy resolution: daemon/state/store_policy_resolver.cc
- Stable cache admission: core/store/components/stable_dram_cache_manager.h