Skip to content

Commit 3a2cb3f

Browse files
author
wasamtc
committed
Merge branch 'main' into tangcong/mac_chunk_v3
2 parents eac6229 + 1b99726 commit 3a2cb3f

17 files changed

Lines changed: 386 additions & 63 deletions

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@
2929
| [**Arxiv**](https://arxiv.org/pdf/2509.26182v1)
3030

3131
## News
32+
- [2026/2] 🦞 Parallax now supports OpenClaw integration! See [Docs](./docs/user_guide/work_with_openclaw.md)
3233
- [2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
3334
- [2025/10] 🔥 Parallax version 0.0.1 has been released!
3435

@@ -51,6 +52,7 @@ The backend architecture:
5152

5253
- [Installation](./docs/user_guide/install.md)
5354
- [Getting Started](./docs/user_guide/quick_start.md)
55+
- [Working with OpenClaw 🦞](./docs/user_guide/work_with_openclaw.md)
5456

5557
## Contributing
5658

Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
## Work with OpenClaw 🦞
2+
3+
### What is OpenClaw 🦞?
4+
5+
[OpenClaw](https://openclaw.ai/) is an open-source personal AI assistant that runs on your own machine. Unlike cloud-based AI services, OpenClaw gives you full control over your data and infrastructure.
6+
7+
Key features include:
8+
9+
- **Multi-platform chat integration**: Interact via WhatsApp, Telegram, Discord, Slack, Signal, or iMessage
10+
- **Persistent memory**: Remembers your preferences and context across sessions
11+
- **Full system access**: Read/write files, run shell commands, and control your browser
12+
- **Extensible skills**: Use community-built skills or create your own
13+
- **Model flexibility**: Works with Anthropic, OpenAI, or local models
14+
15+
Github repo of OpenClaw: https://github.com/openclaw/openclaw
16+
17+
### Prerequisites
18+
19+
To integrate Parallax with OpenClaw, you need to meet the prerequisites for both projects:
20+
21+
- **Node.js**: >= 22 (required by OpenClaw)
22+
- **Python**: >=3.11 (required by Parallax)
23+
24+
Before proceeding, we assume you have already deployed Parallax on your AI cluster. For deployment instructions, please refer to:
25+
26+
- [Installation (Parallax)](./install.md)
27+
- [Quick Start (Parallax)](./quick_start.md)
28+
29+
30+
### Start your Parallax Service
31+
32+
**Step 1: Start the Scheduler**
33+
34+
On your scheduler machine, run:
35+
36+
```bash
37+
parallax run --host 0.0.0.0
38+
```
39+
40+
**Step 2: Select Model**
41+
42+
Open your browser and navigate to `localhost:3001` on the scheduler machine. Select your model and click **Continue**.
43+
44+
**Step 3: Start Edge Nodes**
45+
46+
On your edge nodes, run:
47+
48+
```bash
49+
parallax join --max-sequence-length 65536 --max-num-tokens-per-batch 65536 --enable-prefix-cache
50+
```
51+
52+
**Step 4: Test the Model**
53+
54+
On the scheduler machine, open your browser and navigate to `localhost:3001`. Use the chat interface to test if the model is working properly.
55+
56+
### Onboard OpenClaw
57+
58+
**Step 1: Install OpenClaw**
59+
60+
Use the official install script to install OpenClaw, skipping the onboard wizard:
61+
62+
```bash
63+
curl -fsSL https://openclaw.ai/install.sh | bash -s -- --no-onboard
64+
```
65+
66+
**Step 2: Create Configuration File**
67+
68+
Create the configuration file at `~/.openclaw/openclaw.json` with the following content:
69+
70+
```json
71+
{
72+
"agents": {
73+
"defaults": {
74+
"model": {
75+
"primary": "parallax/your-model-name"
76+
}
77+
}
78+
},
79+
"models": {
80+
"providers": {
81+
"parallax": {
82+
"baseUrl": "http://localhost:3001/v1",
83+
"apiKey": "placeholder",
84+
"api": "openai-completions",
85+
"models": [
86+
{
87+
"id": "your-model-name",
88+
"name": "Parallax Model"
89+
}
90+
]
91+
}
92+
}
93+
}
94+
}
95+
```
96+
97+
**Step 3: Run Onboard**
98+
99+
```bash
100+
openclaw onboard --install-daemon
101+
```
102+
103+
During the onboard process:
104+
105+
1. Read and accept the OpenClaw risk disclaimer
106+
2. When prompted for **onboarding mode**, select `Quick Start`
107+
3. When prompted for **config handling**, select `Use existing values`
108+
4. When prompted for **Model/auth provider**, select `Skip for now`
109+
5. When prompted for **Filter models by provider**, select `All providers`
110+
6. When prompted for **Default model**, select `Keep current (parallax/your-model-name)`
111+
7. When prompted for **Select channel**, configure the channel based on your needs, or select `Skip for now`
112+
8. When prompted for **Select skills**, configure the skills based on your needs, or select `Skip for now`
113+
9. When prompted for **Enable hooks**, configure the hooks based on your needs, or select `Skip for now`
114+
10. Wait a moment for Gateway services being installed.
115+
11. When prompted for **How do you want to hatch your bot**, configure the way you hatch your bot based on your needs.
116+
117+
### Try on Browser
118+
119+
Open your browser and navigate to http://127.0.0.1:18789/. Start sending messages to OpenClaw and enjoy!
120+
121+
### Q&A
122+
123+
**Q: OOM Error**
124+
125+
```
126+
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory)
127+
```
128+
129+
**A:** Add the `--kv-cache-memory-fraction` parameter when starting Parallax on edge nodes:
130+
131+
```bash
132+
parallax join --max-sequence-length 65536 --max-num-tokens-per-batch 65536 --enable-prefix-cache --kv-cache-memory-fraction 0.5
133+
```
134+
135+
If OOM errors persist, try using a smaller value for `--kv-cache-memory-fraction`.

pyproject.toml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,10 @@ parallax = "parallax.cli:main"
4343
[project.optional-dependencies]
4444

4545
mac = [
46+
"nanobind==2.10.2",
4647
"torch==2.8.0",
47-
"mlx-lm==0.30.0",
48-
"mlx==0.30.1",
48+
"mlx-lm==0.30.5",
49+
"mlx==0.30.4",
4950
]
5051

5152
gpu = [

src/backend/server/request_handler.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
import asyncio
2-
import json
32
import time
43
from typing import Dict
54

65
import aiohttp
7-
from fastapi.responses import JSONResponse, StreamingResponse
6+
from fastapi.responses import JSONResponse, Response, StreamingResponse
87
from starlette.concurrency import iterate_in_threadpool
98

109
from backend.server.constants import NODE_STATUS_AVAILABLE
@@ -152,8 +151,7 @@ async def stream_generator():
152151
response = stub.chat_completion(request_data)
153152
content = (await anext(iterate_in_threadpool(response))).decode()
154153
logger.debug(f"Non-stream response completed for {request_id}")
155-
# response is a JSON string; parse to Python object before returning
156-
return JSONResponse(content=json.loads(content))
154+
return Response(content=content, media_type="application/json")
157155
except Exception as e:
158156
forward_attempts += 1
159157
if forward_attempts < self.MAX_FORWARD_RETRY:

src/parallax/launch.py

Lines changed: 32 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -120,36 +120,42 @@ def _wait_executors_check_layer_change(shared_state: SharedState, executor_subpr
120120
check_latest_release()
121121

122122
config = fetch_model_from_hf(args.model_path, local_files_only=args.use_hfcache)
123+
if args.start_layer is None:
124+
args.start_layer = 0
125+
if args.end_layer is None:
126+
args.end_layer = config.get("num_hidden_layers")
127+
123128
# only launch http server on head node
124129
if args.start_layer == 0:
125130
http_server_process = launch_http_server(args)
126131
# Launch P2P server as subprocess
127-
p2p_server_process = launch_p2p_server_process(
128-
initial_peers=args.initial_peers,
129-
scheduler_addr=args.scheduler_addr,
130-
relay_servers=args.relay_servers,
131-
pp_start_layer=args.start_layer,
132-
pp_end_layer=args.end_layer,
133-
hidden_layers=config.get("num_hidden_layers"),
134-
tp_size=args.tp_size,
135-
dp_size=args.dp_size,
136-
tcp_port=args.tcp_port,
137-
udp_port=args.udp_port,
138-
dht_prefix=args.dht_prefix,
139-
announce_maddrs=args.announce_maddrs,
140-
http_port=args.port,
141-
notify_url=args.notify_url,
142-
recv_from_peer_addr=args.recv_from_peer_addr,
143-
send_to_peer_addr=args.send_to_peer_addr,
144-
model_name=args.model_path,
145-
max_batch_size=args.max_batch_size,
146-
max_sequence_length=args.max_sequence_length,
147-
param_mem_ratio=args.param_mem_ratio,
148-
kvcache_mem_ratio=args.kvcache_mem_ratio,
149-
shared_state=shared_state.dict,
150-
log_level=args.log_level,
151-
conn=conn_main,
152-
)
132+
if not (args.start_layer == 0 and args.end_layer == config.get("num_hidden_layers")):
133+
p2p_server_process = launch_p2p_server_process(
134+
initial_peers=args.initial_peers,
135+
scheduler_addr=args.scheduler_addr,
136+
relay_servers=args.relay_servers,
137+
pp_start_layer=args.start_layer,
138+
pp_end_layer=args.end_layer,
139+
hidden_layers=config.get("num_hidden_layers"),
140+
tp_size=args.tp_size,
141+
dp_size=args.dp_size,
142+
tcp_port=args.tcp_port,
143+
udp_port=args.udp_port,
144+
dht_prefix=args.dht_prefix,
145+
announce_maddrs=args.announce_maddrs,
146+
http_port=args.port,
147+
notify_url=args.notify_url,
148+
recv_from_peer_addr=args.recv_from_peer_addr,
149+
send_to_peer_addr=args.send_to_peer_addr,
150+
model_name=args.model_path,
151+
max_batch_size=args.max_batch_size,
152+
max_sequence_length=args.max_sequence_length,
153+
param_mem_ratio=args.param_mem_ratio,
154+
kvcache_mem_ratio=args.kvcache_mem_ratio,
155+
shared_state=shared_state.dict,
156+
log_level=args.log_level,
157+
conn=conn_main,
158+
)
153159

154160
# Build connectors for tp communication
155161
conn_tp_0 = [conn_refit]

src/parallax/p2p/server.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99

1010
import dataclasses
1111
import enum
12-
import json
1312
import multiprocessing
1413
import os
1514
import random
@@ -203,8 +202,8 @@ def chat_completion(
203202
else:
204203
response = client.post(
205204
f"http://localhost:{self.http_port}/v1/chat/completions", json=request
206-
).json()
207-
yield json.dumps(response).encode()
205+
)
206+
yield response.content
208207
except Exception as e:
209208
logger.exception(f"Error in chat completion: {e}")
210209
yield b"internal server error"

src/parallax/server/executor/sglang_executor.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -557,9 +557,7 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
557557
# Extract probs for the sampled tokens only if needed
558558
if needs_probs and hasattr(logits_output, "next_token_logits"):
559559
# Get probs for sampled tokens (next_token_logits contains probabilities)
560-
real_probs = logits_output.next_token_logits[
561-
torch.arange(len(next_token_ids)), next_token_ids
562-
]
560+
real_probs = torch.gather(logits_output.next_token_logits, 1, next_token_ids)
563561
token_probs = real_probs.cpu().float().tolist()
564562

565563
# Return dict with token_ids and optional probs

src/parallax/server/executor/vllm_executor.py

Lines changed: 11 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
import numpy as np
1111
import torch
12+
import torch.nn.functional as F
1213
from vllm.sequence import IntermediateTensors
1314

1415
from parallax.server.executor.base_executor import BaseExecutor
@@ -315,7 +316,7 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
315316
requests = prepared_inputs.get("requests", [])
316317

317318
# Execute model with vLLM
318-
execute_model_state, sampled_token_ids, sampler_output, logits = (
319+
execute_model_state, sampled_token_ids, sampled_token_ids_cpu, sampler_output, logits = (
319320
self.model_runner.execute_model(
320321
scheduler_output=scheduler_output,
321322
intermediate_tensors=intermediate_tensors,
@@ -335,25 +336,21 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
335336
if needs_probs and logits is not None and isinstance(logits, torch.Tensor):
336337

337338
if logits.ndim == 3:
338-
logits = logits[:, -1, :]
339+
logits = logits[:, -1, :] # [batch, seq, vocab_size]
339340
elif logits.ndim != 2:
340341
logger.warning(f"Unexpected logits shape: {logits.shape}")
341342
logits = None
342343

343344
if logits is not None:
344-
probs = torch.softmax(logits, dim=-1)
345+
probs = F.log_softmax(logits, dim=-1)
345346
if isinstance(sampled_token_ids, torch.Tensor):
346347
sampled_ids = sampled_token_ids
347348
else:
348349
sampled_ids = torch.tensor(
349350
sampled_token_ids, device=logits.device, dtype=torch.long
350351
)
351-
token_probs = (
352-
probs[torch.arange(len(sampled_ids), device=logits.device), sampled_ids]
353-
.cpu()
354-
.float()
355-
.tolist()
356-
)
352+
probs = torch.gather(probs, 1, sampled_ids)
353+
token_probs = probs.cpu().float().tolist()
357354

358355
# Align outputs to request order if vLLM reorders the batch internally.
359356
input_batch = getattr(self.model_runner, "input_batch", None)
@@ -362,14 +359,14 @@ def process_batch(self, prepared_inputs: Dict[str, Any], return_decoded_tokens:
362359
request_ids = [req.request_id for req in requests]
363360
if all(rid in req_id_to_index for rid in request_ids):
364361
order = [req_id_to_index[rid] for rid in request_ids]
365-
if isinstance(sampled_token_ids, torch.Tensor):
366-
sampled_token_ids = sampled_token_ids[order]
367-
elif isinstance(sampled_token_ids, list):
368-
sampled_token_ids = [sampled_token_ids[i] for i in order]
362+
if isinstance(sampled_token_ids_cpu, torch.Tensor):
363+
sampled_token_ids_cpu = sampled_token_ids_cpu[order]
364+
elif isinstance(sampled_token_ids_cpu, list):
365+
sampled_token_ids_cpu = [sampled_token_ids_cpu[i] for i in order]
369366
if token_probs is not None:
370367
token_probs = [token_probs[i] for i in order]
371368

372-
return {"hidden_states": sampled_token_ids, "probs": token_probs}
369+
return {"hidden_states": sampled_token_ids_cpu, "probs": token_probs}
373370
else:
374371
# Intermediate peer: return hidden states for next peer
375372
return {"hidden_states": execute_model_state.hidden_states, "probs": None}

0 commit comments

Comments
 (0)