Migration Path: Inference Blueprint → `tangle-inference-core`

This document is the canonical recipe for migrating an inference blueprint to depend on tangle-inference-core. It's based on the llm-inference-blueprint migration, which serves as the reference implementation.

Reference Migration

The canonical reference for this migration is llm-inference-blueprint (formerly vllm-inference-blueprint). Study it before migrating.

Before migration:

billing.rs (428 LOC), metrics.rs (508 LOC), health.rs (135 LOC) — full copies of shared infrastructure
server.rs (1070 LOC) — Axum handlers, nonce store, SpendAuth validation, x402 headers
config.rs (520 LOC) — BillingConfig, ServerConfig, GpuConfig, TangleConfig all inlined
Total: ~3,640 LOC

After migration:

Those three files deleted
server.rs shrunk to 631 LOC (−439)
config.rs shrunk to 221 LOC (−299)
Total: ~1,819 LOC, a 50% reduction

Tests: 5 lib unit tests + 26 server integration tests passing. Clippy clean.

The Pattern

Every inference blueprint shares the same operator-side infrastructure:

ShieldedCredits billing (authorizeSpend / claimPayment)
EIP-712 SpendAuth verification
x402 HTTP payment headers + 402 Payment Required responses
Nonce replay protection
Prometheus metrics registry + RequestGuard RAII
GPU detection via nvidia-smi
Config loading (TOML + env var overrides)

Blueprints differ only in:

The backend they call (vLLM subprocess, HTTP proxy to Modal, ComfyUI, TEI, etc.)
The cost model (per-token, per-char, per-second, per-image, task-type)
The job schema (what's in the request/response)
The HTTP endpoints they expose

This migration moves the shared parts to tangle-inference-core and keeps only the blueprint-specific parts in the blueprint repo.

Step-by-Step Recipe

Step 1: Add the dependency

# operator/Cargo.toml
[dependencies]
tangle-inference-core = { path = "../../tangle-inference-core" }

Step 2: Identify the four categories

Walk through each file in operator/src/ and classify:

Category	Action	Examples
Shared infrastructure	Delete the file, import from core	`billing.rs`, `metrics.rs`, `health.rs`
Shared config	Delete those config structs, re-use core's	`BillingConfig`, `ServerConfig`, `GpuConfig`, `TangleConfig`
Backend glue	Keep — this is the unique value of your blueprint	`vllm.rs`, `modal_proxy.rs`, `voice_engine.rs`, `diffusion.rs`, `video.rs`, `embedding.rs`, `pipeline.rs`
Blueprint-specific logic	Keep — routes, handlers, job schemas	`server.rs`'s HTTP handlers and Axum routes, `lib.rs`'s job handler for on-chain calls

Step 3: Delete the shared files

rm operator/src/billing.rs
rm operator/src/metrics.rs
rm operator/src/health.rs

If your blueprint didn't have billing.rs (embedding, image-gen, video-gen, distributed), you're adding billing via core instead of refactoring it out. Same steps apply.

Step 4: Rewrite `config.rs`

Before (typical):

pub struct OperatorConfig {
    pub tangle: TangleConfig,
    pub server: ServerConfig,
    pub billing: BillingConfig,
    pub gpu: GpuConfig,
    pub vllm: VllmConfig,
    // ... dozens of fields inlined
}

#[derive(Deserialize)]
pub struct BillingConfig {
    pub price_per_input_token: u64,
    pub price_per_output_token: u64,
    pub max_gas_price_gwei: u64,
    // ... many fields
}

#[derive(Deserialize)]
pub struct ServerConfig { /* ... */ }
#[derive(Deserialize)]
pub struct GpuConfig { /* ... */ }
#[derive(Deserialize)]
pub struct TangleConfig { /* ... */ }

After:

use tangle_inference_core::{BillingConfig, GpuConfig, ServerConfig, TangleConfig};

#[derive(Debug, Clone, serde::Deserialize)]
pub struct OperatorConfig {
    pub tangle: TangleConfig,
    pub server: ServerConfig,
    pub billing: BillingConfig,
    pub gpu: GpuConfig,
    // Your backend-specific config section (the only thing you define)
    pub vllm: VllmConfig,  // or modal, voice, embedding, etc.
}

#[derive(Debug, Clone, serde::Deserialize)]
pub struct VllmConfig {
    pub model: String,
    pub host: String,
    pub port: u16,
    // Pricing lives with the backend now (the blueprint's unique value)
    pub price_per_input_token: u64,
    pub price_per_output_token: u64,
    // ... blueprint-specific fields
}

Wire-format note: this is a breaking config change. If you had billing.price_per_input_token in your existing deployed configs, update them to <backend>.price_per_input_token.

Step 5: Define a `Backend` struct

Each blueprint defines its own backend struct holding runtime state plus a CostModel. This gets attached to the shared AppState via AppStateBuilder.

// operator/src/server.rs
use std::sync::Arc;
use tangle_inference_core::{
    AppState, AppStateBuilder, BillingClient, NonceStore,
    PerTokenCostModel, CostModel, CostParams,
};

pub struct VllmBackend {
    pub process: Arc<VllmProcess>,
    pub config: Arc<OperatorConfig>,
    pub cost_model: Arc<PerTokenCostModel>,
}

impl VllmBackend {
    pub fn new(config: Arc<OperatorConfig>, process: Arc<VllmProcess>) -> Self {
        Self {
            cost_model: Arc::new(PerTokenCostModel {
                price_per_input_token: config.vllm.price_per_input_token,
                price_per_output_token: config.vllm.price_per_output_token,
            }),
            process,
            config,
        }
    }
}

Step 6: Build `AppState` via `AppStateBuilder`

// In your server startup code
use tangle_inference_core::AppStateBuilder;

pub async fn start_server(config: Arc<OperatorConfig>, process: Arc<VllmProcess>) -> Result<()> {
    let billing_client = BillingClient::new(&config.tangle, &config.billing).await?;
    let nonce_store = Arc::new(NonceStore::load(config.billing.nonce_store_path.clone()));
    let operator_addr = billing_client.operator_address();

    let state = AppStateBuilder::new()
        .billing(Arc::new(billing_client))
        .nonce_store(nonce_store)
        .server_config(Arc::new(config.server.clone()))
        .billing_config(Arc::new(config.billing.clone()))
        .operator_address(operator_addr)
        .max_concurrent(config.server.max_concurrent_requests)
        .backend(VllmBackend::new(config, process))
        .build()?;

    let app = Router::new()
        .route("/v1/chat/completions", post(chat_completions))
        .route("/v1/models", get(list_models))
        // ... your routes
        .with_state(state);

    // ... serve
}

Step 7: Retrieve backend in handlers

use axum::extract::State;
use tangle_inference_core::{
    AppState, validate_spend_auth, extract_x402_spend_auth,
    payment_required, error_response, RequestGuard,
};

pub async fn chat_completions(
    State(state): State<AppState>,
    headers: HeaderMap,
    Json(req): Json<ChatCompletionRequest>,
) -> impl IntoResponse {
    let backend = state
        .backend::<VllmBackend>()
        .expect("VllmBackend attached to AppState");

    // 1. Extract SpendAuth from x402 headers (shared helper)
    let spend_auth = match extract_x402_spend_auth(&headers) {
        Some(a) => a,
        None => return payment_required(
            &state.billing_config,
            &backend.config.tangle,
            state.operator_address,
            /*estimated=*/ 1000,
        ),
    };

    // 2. Validate via shared helper (nonce, balance, signature, expiry)
    if let Err(e) = validate_spend_auth(&state, &spend_auth).await {
        return error_response(e);
    }

    // 3. Record metrics via shared RequestGuard
    let mut guard = RequestGuard::new(&req.model);

    // 4. Call backend (blueprint-specific)
    let result = backend.process.chat_completion(&req).await;

    match result {
        Ok(response) => {
            // 5. Calculate cost via shared CostModel
            let cost = backend.cost_model.calculate_cost(&CostParams {
                prompt_tokens: response.usage.prompt_tokens,
                completion_tokens: response.usage.completion_tokens,
                ..Default::default()
            });

            guard.record_tokens(
                response.usage.prompt_tokens,
                response.usage.completion_tokens,
            );

            // 6. Settle via shared helper
            let _ = tangle_inference_core::settle_billing(
                &state.billing,
                &spend_auth,
                /*preauth=*/ spend_auth.amount,
                /*actual=*/ cost,
            ).await;

            Json(response).into_response()
        }
        Err(e) => error_response(e),
    }
}

Step 8: Delete duplicated server boilerplate

The following code is now in core and can be deleted from your server.rs:

NonceStore struct and impl → tangle_inference_core::NonceStore
SpendAuthPayload struct → tangle_inference_core::SpendAuthPayload
AccountGuard RAII → tangle_inference_core::server::AccountGuard
AppState struct → tangle_inference_core::AppState
x402_payment_required() helper → tangle_inference_core::payment_required
validate_spend_auth() helper → tangle_inference_core::validate_spend_auth
X402_* header constants → tangle_inference_core::server::X402_*
EIP-712 signature recovery → handled internally by validate_spend_auth

Step 9: Pick the right `CostModel` for your blueprint

Blueprint	Cost Model	Why
LLM chat (llm-inference-blueprint, distributed-inference-blueprint)	`PerTokenCostModel`	Per-input-token + per-output-token pricing matches OpenAI-style billing
Text-to-speech (voice-inference-blueprint)	`PerCharCostModel`	TTS output is measured in characters, not tokens
Embeddings (embedding-inference-blueprint)	`PerTokenCostModel`	Per-1K-token pricing (use `price_per_input_token` for input tokens, 0 for output)
Image generation (image-gen-inference-blueprint)	`PerImageCostModel`	Flat per-image pricing
Video generation (video-gen-inference-blueprint)	`PerSecondCostModel`	Per-second-of-output-video pricing
Multi-task (modal-inference-blueprint)	`TaskTypeCostModel`	Composes multiple models by task type; Modal serves TTS+STT+image+video+fixed in one operator

Example of TaskTypeCostModel composition for Modal:

use tangle_inference_core::{
    TaskTypeCostModel, PerTokenCostModel, PerCharCostModel,
    PerSecondCostModel, PerImageCostModel, FlatRequestCostModel,
    CostModel,
};
use std::collections::HashMap;

let mut per_task: HashMap<String, Box<dyn CostModel>> = HashMap::new();
per_task.insert("chat".into(), Box::new(PerTokenCostModel {
    price_per_input_token: 1,
    price_per_output_token: 3,
}));
per_task.insert("tts".into(), Box::new(PerCharCostModel {
    price_per_1k_chars: 15_000,
}));
per_task.insert("stt".into(), Box::new(PerSecondCostModel {
    price_per_second: 300,
}));
per_task.insert("image".into(), Box::new(PerImageCostModel {
    price_per_image: 50_000,
}));
per_task.insert("video".into(), Box::new(PerSecondCostModel {
    price_per_second: 1_000_000,
}));
per_task.insert("music".into(), Box::new(PerSecondCostModel {
    price_per_second: 500,
}));

let model = TaskTypeCostModel {
    default: Box::new(FlatRequestCostModel { price_per_request: 1000 }),
    per_task,
};

// In the handler:
let cost = model.calculate_cost(&CostParams {
    task_type: Some("tts".into()),
    extra: HashMap::from([("characters".into(), 500)]),
    ..Default::default()
});

Step 10: Update docs

Update your CLAUDE.md, PLAN.md, and README.md:

## Architecture

Depends on [`tangle-inference-core`](../tangle-inference-core/) for all shared
inference-operator infrastructure (billing, metrics, health, nonce store,
spend-auth validation, x402 payment headers, AppState builder). See
[`../tangle-inference-core/MIGRATION.md`](../tangle-inference-core/MIGRATION.md)
for the migration pattern.

The only truly <blueprint-name>-specific code is:
- `operator/src/<backend>.rs` — the backend subprocess/HTTP proxy
- `operator/src/server.rs` — the OpenAI-compatible HTTP handlers
- `operator/src/lib.rs` — the on-chain job handler (TangleArg/TangleResult)
- `contracts/` — the BSM contract and registration schema

Step 11: Verify

cargo check -p <blueprint-name>    # must compile
cargo test -p <blueprint-name>     # must pass
cargo clippy -p <blueprint-name> -- -D warnings   # must be clean

Measure the LOC reduction:

wc -l operator/src/*.rs  # before vs after

Common Pitfalls

1. `AppState` is no longer generic

tangle-inference-core had an earlier design where AppState<B> was generic over the backend type. This was removed. The current AppState is concrete, and the backend is attached via Arc<dyn Any>. Use state.backend::<YourBackend>() to retrieve it.

2. `NonceStore` is now async

Old std::sync::RwLock → new tokio::sync::RwLock. All nonce_store.check_replay() and nonce_store.insert() calls need .await.

3. `BillingClient::new` has two forms

new(&TangleConfig, &BillingConfig) — convenience wrapper, use this if you already have the config structs
new_with_params(rpc_url, operator_key_hex, shielded_credits_address, service_id, max_gas_price_gwei) — for blueprints that don't hold a unified config

4. Pricing moves from `billing.` to `<backend>.`

This is a deliberate wire-format break. The blueprint's unique value is its backend; pricing is part of the backend's contract. The BillingConfig in core only carries billing infrastructure params (contract address, gas cap, nonce store path), not per-token prices.

5. SpendAuth lives in x402 HTTP headers, not the request body

If your blueprint previously read spend_auth from the JSON body, update callers to use x402 headers:

X-Payment-Commitment: <hex>
X-Payment-Service-Id: <u64>
X-Payment-Amount: <u256>
X-Payment-Nonce: <u64>
X-Payment-Expiry: <u64>
X-Payment-Operator: <address>
X-Payment-Signature: <hex>

Use extract_x402_spend_auth(&headers) to parse.

6. Metric names are shared

Metrics like tangle_operator_requests_total are declared in core and shared across all blueprints. Prometheus dashboards scraping tangle_operator_* continue to work, but if your blueprint had custom metric names they need to be renamed.

Migration Checklist

Expected LOC Reduction

Based on the llm-inference-blueprint reference migration:

Blueprint	Before	After (estimated)	Reduction
llm-inference-blueprint (done)	3,640	1,819	−1,821 (50%)
voice-inference-blueprint	~2,553	~1,300	−1,253 (49%)
embedding-inference-blueprint	~2,169	~1,100	−1,069 (49%)
modal-inference-blueprint	~3,011	~1,700	−1,311 (44%)
image-gen-inference-blueprint	~1,999	~1,100	−899 (45%)
video-gen-inference-blueprint	~2,606	~1,400	−1,206 (46%)
distributed-inference-blueprint	~2,166	~1,200	−966 (45%)
Total	~18,144	~9,619	−8,525 (47%)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration Path: Inference Blueprint → `tangle-inference-core`

Reference Migration

The Pattern

Step-by-Step Recipe

Step 1: Add the dependency

Step 2: Identify the four categories

Step 3: Delete the shared files

Step 4: Rewrite `config.rs`

Step 5: Define a `Backend` struct

Step 6: Build `AppState` via `AppStateBuilder`

Step 7: Retrieve backend in handlers

Step 8: Delete duplicated server boilerplate

Step 9: Pick the right `CostModel` for your blueprint

Step 10: Update docs

Step 11: Verify

Common Pitfalls

1. `AppState` is no longer generic

2. `NonceStore` is now async

3. `BillingClient::new` has two forms

4. Pricing moves from `billing.` to `<backend>.`

5. SpendAuth lives in x402 HTTP headers, not the request body

6. Metric names are shared

Migration Checklist

Expected LOC Reduction

FilesExpand file tree

MIGRATION.md

Latest commit

History

MIGRATION.md

File metadata and controls

Migration Path: Inference Blueprint → tangle-inference-core

Reference Migration

The Pattern

Step-by-Step Recipe

Step 1: Add the dependency

Step 2: Identify the four categories

Step 3: Delete the shared files

Step 4: Rewrite config.rs

Step 5: Define a Backend struct

Step 6: Build AppState via AppStateBuilder

Step 7: Retrieve backend in handlers

Step 8: Delete duplicated server boilerplate

Step 9: Pick the right CostModel for your blueprint

Step 10: Update docs

Step 11: Verify

Common Pitfalls

1. AppState is no longer generic

2. NonceStore is now async

3. BillingClient::new has two forms

4. Pricing moves from billing.* to <backend>.*

5. SpendAuth lives in x402 HTTP headers, not the request body

6. Metric names are shared

Migration Checklist

Expected LOC Reduction

Migration Path: Inference Blueprint → `tangle-inference-core`

Step 4: Rewrite `config.rs`

Step 5: Define a `Backend` struct

Step 6: Build `AppState` via `AppStateBuilder`

Step 9: Pick the right `CostModel` for your blueprint

1. `AppState` is no longer generic

2. `NonceStore` is now async

3. `BillingClient::new` has two forms

4. Pricing moves from `billing.` to `<backend>.`