GradientHQ
diff --git a/‎README.md‎
Lines changed: 2 additions & 0 deletions b/‎README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/user_guide/work_with_openclaw.md‎
Lines changed: 135 additions & 0 deletions b/‎docs/user_guide/work_with_openclaw.md‎
Lines changed: 135 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 3 additions & 2 deletions b/‎pyproject.toml‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎src/backend/server/request_handler.py‎
Lines changed: 2 additions & 4 deletions b/‎src/backend/server/request_handler.py‎
Lines changed: 2 additions & 4 deletions
diff --git a/‎src/parallax/launch.py‎
Lines changed: 32 additions & 26 deletions b/‎src/parallax/launch.py‎
Lines changed: 32 additions & 26 deletions
diff --git a/‎src/parallax/p2p/server.py‎
Lines changed: 2 additions & 3 deletions b/‎src/parallax/p2p/server.py‎
Lines changed: 2 additions & 3 deletions
diff --git a/‎src/parallax/server/executor/sglang_executor.py‎
Lines changed: 1 addition & 3 deletions b/‎src/parallax/server/executor/sglang_executor.py‎
Lines changed: 1 addition & 3 deletions
diff --git a/‎src/parallax/server/executor/vllm_executor.py‎
Lines changed: 11 additions & 14 deletions b/‎src/parallax/server/executor/vllm_executor.py‎
Lines changed: 11 additions & 14 deletions
@@ -29,6 +29,7 @@
 | [**Arxiv**](https://arxiv.org/pdf/2509.26182v1)
 
 ## News
+- [2026/2] 🦞 Parallax now supports OpenClaw integration! See [Docs](./docs/user_guide/work_with_openclaw.md)
 - [2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
 - [2025/10] 🔥 Parallax version 0.0.1 has been released!
 
@@ -51,6 +52,7 @@ The backend architecture:
 
 - [Installation](./docs/user_guide/install.md)
 - [Getting Started](./docs/user_guide/quick_start.md)
+- [Working with OpenClaw 🦞](./docs/user_guide/work_with_openclaw.md)
 
 ## Contributing
 
 
@@ -0,0 +1,135 @@
+## Work with OpenClaw 🦞
+
+### What is OpenClaw 🦞?
+
+[OpenClaw](https://openclaw.ai/) is an open-source personal AI assistant that runs on your own machine. Unlike cloud-based AI services, OpenClaw gives you full control over your data and infrastructure.
+
+Key features include:
+
+- **Multi-platform chat integration**: Interact via WhatsApp, Telegram, Discord, Slack, Signal, or iMessage
+- **Persistent memory**: Remembers your preferences and context across sessions
+- **Full system access**: Read/write files, run shell commands, and control your browser
+- **Extensible skills**: Use community-built skills or create your own
+- **Model flexibility**: Works with Anthropic, OpenAI, or local models
+
+Github repo of OpenClaw: https://github.com/openclaw/openclaw
+
+### Prerequisites
+
+To integrate Parallax with OpenClaw, you need to meet the prerequisites for both projects:
+
+- **Node.js**: >= 22 (required by OpenClaw)
+- **Python**: >=3.11 (required by Parallax)
+
+Before proceeding, we assume you have already deployed Parallax on your AI cluster. For deployment instructions, please refer to:
+
+- [Installation (Parallax)](./install.md)
+- [Quick Start (Parallax)](./quick_start.md)
+
+
+### Start your Parallax Service
+
+**Step 1: Start the Scheduler**
+
+On your scheduler machine, run:
+
+```bash
+parallax run --host 0.0.0.0
+```
+
+**Step 2: Select Model**
+
+Open your browser and navigate to `localhost:3001` on the scheduler machine. Select your model and click **Continue**.
+
+**Step 3: Start Edge Nodes**
+
+On your edge nodes, run:
+
+```bash
+parallax join --max-sequence-length 65536 --max-num-tokens-per-batch 65536 --enable-prefix-cache
+```
+
+**Step 4: Test the Model**
+
+On the scheduler machine, open your browser and navigate to `localhost:3001`. Use the chat interface to test if the model is working properly.
+
+### Onboard OpenClaw
+
+**Step 1: Install OpenClaw**
+
+Use the official install script to install OpenClaw, skipping the onboard wizard:
+
+```bash
+curl -fsSL https://openclaw.ai/install.sh | bash -s -- --no-onboard
+```
+
+**Step 2: Create Configuration File**
+
+Create the configuration file at `~/.openclaw/openclaw.json` with the following content:
+
+```json
+{
+  "agents": {
+    "defaults": {
+      "model": {
+        "primary": "parallax/your-model-name"
+      }
+    }
+  },
+  "models": {
+    "providers": {
+      "parallax": {
+        "baseUrl": "http://localhost:3001/v1",
+        "apiKey": "placeholder",
+        "api": "openai-completions",
+        "models": [
+          {
+            "id": "your-model-name",
+            "name": "Parallax Model"
+          }
+        ]
+      }
+    }
+  }
+}
+```
+
+**Step 3: Run Onboard**
+
+```bash
+openclaw onboard --install-daemon
+```
+
+During the onboard process:
+
+1. Read and accept the OpenClaw risk disclaimer
+2. When prompted for **onboarding mode**, select `Quick Start`
+3. When prompted for **config handling**, select `Use existing values`
+4. When prompted for **Model/auth provider**, select `Skip for now`
+5. When prompted for **Filter models by provider**, select `All providers`
+6. When prompted for **Default model**, select `Keep current (parallax/your-model-name)`
+7. When prompted for **Select channel**, configure the channel based on your needs, or select `Skip for now`
+8. When prompted for **Select skills**, configure the skills based on your needs, or select `Skip for now`
+9. When prompted for **Enable hooks**, configure the hooks based on your needs, or select `Skip for now`
+10. Wait a moment for Gateway services being installed.
+11. When prompted for **How do you want to hatch your bot**, configure the way you hatch your bot based on your needs.
+
+### Try on Browser
+
+Open your browser and navigate to http://127.0.0.1:18789/. Start sending messages to OpenClaw and enjoy!
+
+### Q&A
+
+**Q: OOM Error**
+
+```
+libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
+```
+
+**A:** Add the `--kv-cache-memory-fraction` parameter when starting Parallax on edge nodes:
+
+```bash
+parallax join --max-sequence-length 65536 --max-num-tokens-per-batch 65536 --enable-prefix-cache --kv-cache-memory-fraction 0.5
+```
+
+If OOM errors persist, try using a smaller value for `--kv-cache-memory-fraction`.
@@ -43,9 +43,10 @@ parallax = "parallax.cli:main"
 [project.optional-dependencies]
 
 mac = [
+  "nanobind==2.10.2",
   "torch==2.8.0",
-  "mlx-lm==0.30.0",
-  "mlx==0.30.1",
+  "mlx-lm==0.30.5",
+  "mlx==0.30.4",
 ]
 
 gpu = [
 
@@ -1,10 +1,9 @@
 import asyncio
-import json
 import time
 from typing import Dict
 
 import aiohttp
-from fastapi.responses import JSONResponse, StreamingResponse
+from fastapi.responses import JSONResponse, Response, StreamingResponse
 from starlette.concurrency import iterate_in_threadpool
 
 from backend.server.constants import NODE_STATUS_AVAILABLE
@@ -152,8 +151,7 @@ async def stream_generator():
                     response = stub.chat_completion(request_data)
                     content = (await anext(iterate_in_threadpool(response))).decode()
                     logger.debug(f"Non-stream response completed for {request_id}")
-                    # response is a JSON string; parse to Python object before returning
-                    return JSONResponse(content=json.loads(content))
+                    return Response(content=content, media_type="application/json")
             except Exception as e:
                 forward_attempts += 1
                 if forward_attempts < self.MAX_FORWARD_RETRY:
 
@@ -120,36 +120,42 @@ def _wait_executors_check_layer_change(shared_state: SharedState, executor_subpr
             check_latest_release()
 
             config = fetch_model_from_hf(args.model_path, local_files_only=args.use_hfcache)
+            if args.start_layer is None:
+                args.start_layer = 0
+            if args.end_layer is None:
+                args.end_layer = config.get("num_hidden_layers")
+
             # only launch http server on head node
             if args.start_layer == 0:
                 http_server_process = launch_http_server(args)
             # Launch P2P server as subprocess
-            p2p_server_process = launch_p2p_server_process(
-                initial_peers=args.initial_peers,
-                scheduler_addr=args.scheduler_addr,
-                relay_servers=args.relay_servers,
-                pp_start_layer=args.start_layer,
-                pp_end_layer=args.end_layer,
-                hidden_layers=config.get("num_hidden_layers"),
-                tp_size=args.tp_size,
-                dp_size=args.dp_size,
-                tcp_port=args.tcp_port,
-                udp_port=args.udp_port,
-                dht_prefix=args.dht_prefix,
-                announce_maddrs=args.announce_maddrs,
-                http_port=args.port,
-                notify_url=args.notify_url,
-                recv_from_peer_addr=args.recv_from_peer_addr,
-                send_to_peer_addr=args.send_to_peer_addr,
-                model_name=args.model_path,
-                max_batch_size=args.max_batch_size,
-                max_sequence_length=args.max_sequence_length,
-                param_mem_ratio=args.param_mem_ratio,
-                kvcache_mem_ratio=args.kvcache_mem_ratio,
-                shared_state=shared_state.dict,
-                log_level=args.log_level,
-                conn=conn_main,
-            )
+            if not (args.start_layer == 0 and args.end_layer == config.get("num_hidden_layers")):
+                p2p_server_process = launch_p2p_server_process(
+                    initial_peers=args.initial_peers,
+                    scheduler_addr=args.scheduler_addr,
+                    relay_servers=args.relay_servers,
+                    pp_start_layer=args.start_layer,
+                    pp_end_layer=args.end_layer,
+                    hidden_layers=config.get("num_hidden_layers"),
+                    tp_size=args.tp_size,
+                    dp_size=args.dp_size,
+                    tcp_port=args.tcp_port,
+                    udp_port=args.udp_port,
+                    dht_prefix=args.dht_prefix,
+                    announce_maddrs=args.announce_maddrs,
+                    http_port=args.port,
+                    notify_url=args.notify_url,
+                    recv_from_peer_addr=args.recv_from_peer_addr,
+                    send_to_peer_addr=args.send_to_peer_addr,
+                    model_name=args.model_path,
+                    max_batch_size=args.max_batch_size,
+                    max_sequence_length=args.max_sequence_length,
+                    param_mem_ratio=args.param_mem_ratio,
+                    kvcache_mem_ratio=args.kvcache_mem_ratio,
+                    shared_state=shared_state.dict,
+                    log_level=args.log_level,
+                    conn=conn_main,
+                )
 
             # Build connectors for tp communication
             conn_tp_0 = [conn_refit]
 
@@ -9,7 +9,6 @@
 
 import dataclasses
 import enum
-import json
 import multiprocessing
 import os
 import random
@@ -203,8 +202,8 @@ def chat_completion(
                 else:
                     response = client.post(
                         f"http://localhost:{self.http_port}/v1/chat/completions", json=request
-                    ).json()
-                    yield json.dumps(response).encode()
+                    )
+                    yield response.content
         except Exception as e:
             logger.exception(f"Error in chat completion: {e}")
             yield b"internal server error"
 
@@ -557,9 +557,7 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
             # Extract probs for the sampled tokens only if needed
             if needs_probs and hasattr(logits_output, "next_token_logits"):
                 # Get probs for sampled tokens (next_token_logits contains probabilities)
-                real_probs = logits_output.next_token_logits[
-                    torch.arange(len(next_token_ids)), next_token_ids
-                ]
+                real_probs = torch.gather(logits_output.next_token_logits, 1, next_token_ids)
                 token_probs = real_probs.cpu().float().tolist()
 
             # Return dict with token_ids and optional probs
 
@@ -9,6 +9,7 @@
 
 import numpy as np
 import torch
+import torch.nn.functional as F
 from vllm.sequence import IntermediateTensors
 
 from parallax.server.executor.base_executor import BaseExecutor
@@ -315,7 +316,7 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
         requests = prepared_inputs.get("requests", [])
 
         # Execute model with vLLM
-        execute_model_state, sampled_token_ids, sampler_output, logits = (
+        execute_model_state, sampled_token_ids, sampled_token_ids_cpu, sampler_output, logits = (
             self.model_runner.execute_model(
                 scheduler_output=scheduler_output,
                 intermediate_tensors=intermediate_tensors,
@@ -335,25 +336,21 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
             if needs_probs and logits is not None and isinstance(logits, torch.Tensor):
 
                 if logits.ndim == 3:
-                    logits = logits[:, -1, :]
+                    logits = logits[:, -1, :]  # [batch, seq, vocab_size]
                 elif logits.ndim != 2:
                     logger.warning(f"Unexpected logits shape: {logits.shape}")
                     logits = None
 
                 if logits is not None:
-                    probs = torch.softmax(logits, dim=-1)
+                    probs = F.log_softmax(logits, dim=-1)
                     if isinstance(sampled_token_ids, torch.Tensor):
                         sampled_ids = sampled_token_ids
                     else:
                         sampled_ids = torch.tensor(
                             sampled_token_ids, device=logits.device, dtype=torch.long
                         )
-                    token_probs = (
-                        probs[torch.arange(len(sampled_ids), device=logits.device), sampled_ids]
-                        .cpu()
-                        .float()
-                        .tolist()
-                    )
+                    probs = torch.gather(probs, 1, sampled_ids)
+                    token_probs = probs.cpu().float().tolist()
 
             # Align outputs to request order if vLLM reorders the batch internally.
             input_batch = getattr(self.model_runner, "input_batch", None)
@@ -362,14 +359,14 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
                 request_ids = [req.request_id for req in requests]
                 if all(rid in req_id_to_index for rid in request_ids):
                     order = [req_id_to_index[rid] for rid in request_ids]
-                    if isinstance(sampled_token_ids, torch.Tensor):
-                        sampled_token_ids = sampled_token_ids[order]
-                    elif isinstance(sampled_token_ids, list):
-                        sampled_token_ids = [sampled_token_ids[i] for i in order]
+                    if isinstance(sampled_token_ids_cpu, torch.Tensor):
+                        sampled_token_ids_cpu = sampled_token_ids_cpu[order]
+                    elif isinstance(sampled_token_ids_cpu, list):
+                        sampled_token_ids_cpu = [sampled_token_ids_cpu[i] for i in order]
                     if token_probs is not None:
                         token_probs = [token_probs[i] for i in order]
 
-            return {"hidden_states": sampled_token_ids, "probs": token_probs}
+            return {"hidden_states": sampled_token_ids_cpu, "probs": token_probs}
         else:
             # Intermediate peer: return hidden states for next peer
             return {"hidden_states": execute_model_state.hidden_states, "probs": None}