Skip to content

versioned; less chatty ocf#24

Merged
robmsmt merged 1 commit into
mainfrom
robmsmt/update-ocf-version
Mar 23, 2026
Merged

versioned; less chatty ocf#24
robmsmt merged 1 commit into
mainfrom
robmsmt/update-ocf-version

Conversation

@robmsmt
Copy link
Copy Markdown
Contributor

@robmsmt robmsmt commented Mar 20, 2026

No description provided.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates container environment configuration to use a shared, infrastructure-managed OCF mount path (instead of a user-specific bin path), aligning both current and legacy serving configs.

Changes:

  • Replaced /iopsstor/.../a09/xyao/bin:/ocfbin with /capstor/.../infra01/ocf-share:/ocfbin across vLLM and sglang env TOMLs.
  • Applied the same mount change consistently in both src/swiss_ai_model_launch and legacy/serving environments.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/swiss_ai_model_launch/assets/envs/vllm.toml Switches /ocfbin mount to infra-managed ocf-share path
src/swiss_ai_model_launch/assets/envs/sglang.toml Switches /ocfbin mount to infra-managed ocf-share path
src/swiss_ai_model_launch/assets/envs/sglang_kimi.toml Switches /ocfbin mount to infra-managed ocf-share path
legacy/serving/envs/vllm.toml Switches /ocfbin mount to infra-managed ocf-share path
legacy/serving/envs/sglang.toml Switches /ocfbin mount to infra-managed ocf-share path
legacy/serving/envs/sglang_kimi.toml Switches /ocfbin mount to infra-managed ocf-share path
legacy/serving/envs/sglang_glm.toml Switches /ocfbin mount to infra-managed ocf-share path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@AryanAhadinia AryanAhadinia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested your new PR. It is great! But just a small issue. In the middle of process, I think we are doing a busy-waiting and check for peers in a table in a loop. without sleep. As a result, every microsecond, a line of log is dropped for that. Look at the below logs timestamps.

2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.178+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.179+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table
2026-03-20T18:02:26.180+0100	ERROR	provider	provider/reprovider.go:510	reproviding failed failed to find any peer in table

The rest is great! Thanks!

@robmsmt
Copy link
Copy Markdown
Contributor Author

robmsmt commented Mar 23, 2026

Good find, the busy loop isn't in our code but inside the boxo provider/reprovider library (a dependency of ipfs-lite). I think for now it is ok to leave since it's in this state for a very short time and changing it might require a lot more testing. In the interim i've suppressed that libraries logs to just FATAL ones.

Steps:

  • 1. Merge this PR: reduce boxo ext lib loglevel OpenTela#2
  • 2. Create OpenTela release v0.0.3
  • 3. Download binaries from releases page then SCP them to /capstor/store/cscs/swissai/infra01/ocf-share
  • 4. Retest

These are complete. Can you retest?

I launched this on DEV with:

python serving/submit_job.py \
  --slurm-nodes 1 \
  --slurm-time 6:00:00 \
  --serving-framework sglang \
  --slurm-environment $(pwd)/serving/envs/sglang_kimi.toml \
  --ocf-bootstrap-addr /ip4/148.187.108.177/tcp/43905/p2p/QmbUKJkCfotDzbFE5uoTsXD4GRyPHjzZC1f2yAGLoeBMn9 \
  --framework-args "--model-path /capstor/store/cscs/swissai/infra01/hf_models/models/meta-llama/Llama-3.3-70B-Instruct --host 0.0.0.0 --port 8080 --tensor-parallel-size 4 --served-model-name meta-llama/Llama-3.3-70B-Instruct-$(whoami) --grammar-backend llguidance"

@robmsmt robmsmt merged commit 8c3a3a0 into main Mar 23, 2026
7 checks passed
@robmsmt robmsmt deleted the robmsmt/update-ocf-version branch March 23, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants