TensorCast Startup and Integration User Guide¶

This guide explains how users should bootstrap TensorCast in real deployments. It focuses on:

tensorcast.init(...) behavior and mode selection
CLI-managed vs SDK-managed daemon lifecycle
API/SDK integration patterns
TP or other multi-process workloads sharing one daemon

1. Startup Model at a Glance¶

tensorcast.init is the main startup entry (tensorcast.startup.init). It supports three modes:

Mode	What it does	Daemon owner	Typical usage	Main risk
`connect`	Connects to an existing daemon only	External (CLI/operator/other process)	Production services, TP workers	Fails if no reachable daemon
`create`	Starts a local daemon and connects to it	Current process	Single-process tools, local dev	Process exit may stop daemon
`auto`	Singleflight connect-or-create under same runtime root	Leader process selected at runtime	Concurrent local workers booting together	Config mismatch across processes causes startup failure

2. Recommended Decision Order¶

Environment	Recommended pattern	Why
Production service / long-lived inference	Start daemon via CLI, app uses `init(mode="connect")`	Clean lifecycle boundaries, easier ops
Local development or notebook	`init(mode="create")`	Fast self-contained bootstrap
Many local processes start at same time	`init(mode="auto")`	Prevents duplicate daemon launches
TP/forked workers	Pre-start daemon, each worker `connect`	Most stable and predictable

3. Configuration Resolution¶

3.1 Daemon config path (`create` / `auto`)¶

Priority	Source
1	`daemon_config_path` parameter
2	`TENSORCAST_DAEMON_CONFIG`
3	`examples/config/store_daemon_config.yaml` (repo or packaged wheel)

If none is found, startup fails.

3.2 Global Store orchestration¶

`global_store_mode`	Behavior	When to use
`none`	No Global Store orchestration	Local-only or minimal setups
`connect`	Connect to an existing Global Store	Production clusters with managed GS
`start`	Start a new local Global Store first, then daemon; fail if one already exists locally	Local all-in-one workflows

3.3 Optional port overrides (`create` / `auto`)¶

SDK-managed launch now accepts a structured port_config object so callers can override daemon / Global Store ports without writing ad-hoc config files.

Field	Meaning
`daemon_listen_port`	Daemon gRPC port
`daemon_p2p_port`	Daemon P2P/data-plane port
`global_store_listen_port`	Global Store gRPC port
`global_store_metrics_port`	Global Store Prometheus metrics port

Rules:

Rule	Behavior
Port value `0`	Auto-pick a free port at launch
`connect` mode	`port_config` is ignored because no local process is started
`global_store_mode!="start"`	Global Store port overrides are ignored

4. Integration Patterns¶

Pattern A (Recommended): CLI-managed services + SDK connect¶

Use CLI to manage lifecycle, and keep app processes stateless regarding daemon ownership.

# 1) Start Global Store (if needed)
uv run tensorcast-cli global start --config=examples/config/global_store_config.yaml

# 2) Start Store Daemon
uv run tensorcast-cli daemon start \
  --config=examples/config/store_daemon_config.yaml \
  --global-store-mode connect \
  --global-store-address 127.0.0.1:50051

import tensorcast as tc

tc.init(mode="connect", address="127.0.0.1:50052", show_daemon_logs=False)

artifact = tc.artifact(key="model:latest")
tensors = artifact.tensor_dict(device="cuda:0")

tc.shutdown()

Pattern B: SDK self-managed launch (`create`)¶

Good for local dev and simple scripts.

import tensorcast as tc

tc.init(
    mode="create",
    daemon_config_path="examples/config/store_daemon_config.yaml",
    global_store_mode="start",
    global_store_config_path="examples/config/global_store_config.yaml",
    show_daemon_logs=False,
)

# register / get / artifact operations...

tc.shutdown()

Pattern B1: SDK self-managed launch with explicit ports¶

import tensorcast as tc

tc.init(
    mode="create",
    daemon_config_path="examples/config/store_daemon_config.yaml",
    global_store_mode="start",
    global_store_config_path="examples/config/global_store_config.yaml",
    port_config=tc.PortConfig(
        daemon_listen_port=50052,
        daemon_p2p_port=0,
        global_store_listen_port=50051,
        global_store_metrics_port=18008,
    ),
    show_daemon_logs=False,
)

global_store_mode="start" is exclusive for the current runtime root. If a healthy local Global Store is already recorded under the same TENSORCAST_HOME, startup fails instead of borrowing that instance; stop the existing GS first or switch to global_store_mode="connect".

Pattern C: Concurrent local startup (`auto`)¶

auto is useful when many local processes may start at once and should converge to one daemon.

import tensorcast as tc

tc.init(
    mode="auto",
    daemon_config_path="examples/config/store_daemon_config.yaml",
    global_store_mode="connect",
    global_store_address="127.0.0.1:50051",
    show_daemon_logs=False,
)

Best practice for auto:

Rule	Reason
Keep init parameters identical across participating processes	`auto` validates a config hash and rejects mismatches
Do not pass different explicit `session_id` values per process	Breaks process-group singleflight expectations
Prefer `connect` for long-lived production worker pools	Owner process semantics in `auto` are harder to operate

5. API/SDK Usage Patterns¶

5.1 Module-level API (simple and common)¶

import tensorcast as tc

tc.init(mode="connect", address="127.0.0.1:50052")
tc.register({"w": some_cuda_tensor}, key="model:v1")
art = tc.artifact(key="model:v1")
weights = art.tensor_dict(device="cuda:0")

Note: init(mode="connect") only attaches to an existing daemon. global_store_mode, global_store_address, and global_store_config_path do not reconfigure that daemon. Set Global Store when daemon is created/started (init(mode="create"|"auto", ...) or uv run tensorcast-cli daemon start ...).

5.2 Explicit `Store` object (advanced tuning)¶

import tensorcast as tc
from tensorcast.api.store.types import RetryPolicy

tc.init(mode="connect", address="127.0.0.1:50052")

store = tc.store(
    opts=tc.StoreOptions(
        get=tc.GetArtifactOptions(source="local_only"),
        retry_overrides={"get": RetryPolicy(20.0, 2, 0.1, 2.0, 0.5)},
    )
)

art = store.artifact(key="model:latest")

6. TP / Multi-Process / Fork Best Practices¶

For tensor parallel or any multi-process setup, treat daemon as shared infrastructure.

6.1 Recommended architecture¶

Step	Recommendation
1	Start daemon once (CLI/system service)
2	Spawn/fork worker processes
3	In each worker process, call `tc.init(mode="connect", address=...)`
4	Build per-rank views (`artifact.view(slices=...)`) and materialize locally

6.2 Do / Don’t¶

Do	Don’t
Initialize TensorCast inside each worker process	Call `tc.init()` in parent before `fork`
Use explicit daemon address in distributed launches	Rely on implicit local discovery across hosts
Use `connect` for stable long-running TP services	Use `create` in every rank
Keep one daemon endpoint per process	Attempt to switch daemon address after client is initialized

6.3 Can forked workers call `auto` directly?¶

Yes. You do not need to pre-start daemon if each forked worker calls tc.init(mode="auto") after fork. One worker will become leader and start daemon; others wait and connect.

Required conditions:

Condition	Why
Call `auto` in child process (after fork)	Avoid inheriting parent-initialized runtime/client state
Keep startup args identical across workers	`auto` enforces config-hash consistency
Share the same runtime root (`TENSORCAST_HOME`)	Singleflight election happens under one runtime root

Operational caveat:

Caveat	Impact
Leader process is daemon owner	If owner exits early, daemon can be stopped and other workers are affected

For long-running TP services, prefer a dedicated daemon process and worker connect.

6.4 Worker template¶

def worker_main(rank: int, daemon_addr: str) -> None:
    import tensorcast as tc
    import torch

    torch.cuda.set_device(rank)
    tc.init(mode="connect", address=daemon_addr, show_daemon_logs=False)

    artifact = tc.artifact(key="model:latest").view(slices=build_rank_slices(rank))
    _ = artifact.tensor_dict(device=f"cuda:{rank}")

    tc.shutdown()

7. Troubleshooting¶

Symptom	Likely cause	Action
`No local daemon session found` in `connect`	No running daemon, or no discovered local session	Start daemon via CLI or pass explicit `address`
`AUTO_CONFIG_MISMATCH` in `auto`	Different init/config params across processes	Make all `auto` startup args identical
`Materialization requires a Global Store connection`	Daemon not connected to Global Store for requested operation	Configure Global Store at daemon startup (`init(mode="create"\|"auto", global_store_...)` or `tensorcast-cli daemon start --global-store-...`)
`client already initialized for address ... refusing second client`	Same process tried to bind to another daemon address	Use one daemon endpoint per process; restart process if switching is needed

8. Production Checklist¶

Item	Status
Daemon lifecycle managed outside app process (CLI/system)	☐
App uses `init(mode="connect", address=...)`	☐
Global Store mode selected intentionally (`none`/`connect`/`start`)	☐
TP workers initialize TensorCast after process start	☐
All startup configs are deterministic and version-controlled	☐