We will walk through you the easiest way to quickly set up your own AI cluster.
If you have not installed Prakasa yet, please refer to the installation guide and follow the instructions.
First launch our scheduler on the main node, we recommend you to use your most convenient computer for this.
- For Linux/macOS:
prakasa run- For Windows, start Powershell console as administrator and run:
prakasa runTo allow the API to be accessible from other machines, add the argument --host 0.0.0.0 when launching scheduler.
prakasa run --host 0.0.0.0When running prakasa run for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the -u flag:
prakasa run -uOr, use the configure file to setup scheduler directly:
cp config.template.yaml my_config.yaml
# edit my_config.yaml to set your desired options
prakasa run -c {my-config-file-path}Open http://localhost:3001 and you should see the setup interface.
Select your desired node and model config and click continue.
Note: When running in remote mode, Prakasa will use a public relay server to help establish connections between the scheduler and nodes. The public relay server will receive the IP information of both the scheduler and the nodes in order to facilitate this connection.
Copy the generated join command line to your node and run. For remote connection, you can find your scheduler-address in the scheduler logs.
# local area network env
prakasa join
# public network env
prakasa join -s {scheduler-address}
# example
prakasa join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBuYou should see your nodes start to show up with their status. Wait until all nodes are successfully connected, and you will automatically be directed to the chat interface.
When running prakasa join for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the -u flag:
prakasa join -uOr, use the configure file to setup worker directly:
prakasa join -c {my-config-file-path}Done! You have your own AI cluster now.
You can access the chat interface from any non-scheduler computer, not just those running a node server. Simply start the chat server with:
# local area network env
prakasa chat
# public network env
prakasa chat -s {scheduler-address}
# example
prakasa chat -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBuAfter launching, visit http://localhost:3002 in your browser to use the chat interface.
To allow the API to be accessible from other machines, add the argument --host 0.0.0.0 when launching chat interface.
prakasa chat --host 0.0.0.0Or, use the configure file to setup chat directly:
prakasa chat -c {my-config-file-path}First launch our scheduler on the main node.
prakasa run -m {model-name} -n {number-of-worker-nodes}For example:
prakasa run -m Qwen/Qwen3-0.6B -n 2Please notice and record the scheduler ip4 address generated in the terminal.
For each distributed nodes including the main node, open a terminal and join the server with the scheduler address.
# local area network env
prakasa join
# public network env
prakasa join -s {scheduler-address}For example:
# first node
prakasa join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
# second node
prakasa join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBucurl --location 'http://localhost:3001/v1/chat/completions' --header 'Content-Type: application/json' --data '{
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "hello"
}
],
"stream": true
}'Note: For models such as Qwen3 and gpt-oss, the "reasoning" (or "thinking") feature is enabled by default. To disable it, add
"chat_template_kwargs": {"enable_thinking": false}to your request payload.
Developers can start Prakasa backend engine without a scheduler. Pipeline parallel start/end layers should be set manually. An example of serving Qwen3-0.6B with 2-nodes:
- First node:
python3 ./prakasa/src/prakasa/launch.py \
--model-path Qwen/Qwen3-0.6B \
--port 3000 \
--max-batch-size 8 \
--start-layer 0 \
--end-layer 14- Second node:
python3 ./prakasa/src/prakasa/launch.py \
--model-path Qwen/Qwen3-0.6B \
--port 3000 \
--max-batch-size 8 \
--start-layer 14 \
--end-layer 28Call chat API on one of the nodes:
curl --location 'http://localhost:3000/v1/chat/completions' --header 'Content-Type: application/json' --data '{
"max_tokens": 1024,
"messages": [
{
"role": "user",
"content": "hello"
}
],
"stream": true
}'
Q: When deploying on cloud servers, I encounter an error like "lattica RPC call failed". What does this mean and how can I resolve it?
A: This error typically occurs when the necessary network ports for communication between the scheduler and nodes are blocked—most often due to firewall or security group settings on your cloud platform.
How to fix:
- Ensure that the relevant TCP/UDP ports for both the scheduler and nodes are open and accessible between all machines in your cluster.
- By default, the scheduler uses HTTP port
3001, and nodes use HTTP port3000. You can change these with the--portargument (e.g.,prakasa run --port <your_port>orprakasa join --port <your_port>). - For Lattica (node-to-node) communication, random ports are used by default. It is best to explicitly specify which TCP and UDP ports to use (e.g.,
--tcp-port <your_tcp_port> --udp-port <your_udp_port>), and then open those ports for inbound and outbound traffic in your cloud provider's security settings.
- By default, the scheduler uses HTTP port
- Check your cloud provider's firewall or network security group configurations:
- Open inbound rules for the ports mentioned above on all scheduler and node machines.
- Make sure that ports are open to the desired sources (e.g., to all cluster instances, or to your public IPs if required).
After updating the firewall/security group settings to allow these ports, restart your scheduler and nodes.
Q: When running on macOS, I encounter the error:
error sending packet on iface address No route to host (os error 65) address=192.168.xxx.xxx. What does this mean and how can I fix it?
A: On macOS, you need to allow your terminal or IDE (such as Terminal, iTerm2, VS Code, Cursor, etc.) access to the local network in order for Prakasa to work correctly. If the application prompts you for network access the first time you run Prakasa, click "Allow." If you have already denied access, follow these steps to enable it:
- Open System Settings from the Apple menu.
- Click on Privacy & Security in the sidebar.
- Click on Local Network.
- For each app listed, turn the ability to access your local network on or off using the toggle switch.
This will ensure Prakasa has the proper network permissions for local communication.
Q: When running the scheduler on Windows, nodes on other PCs cannot detect the scheduler ID over the local network. Why can't other machines join the cluster?
A: If you are running Prakasa in WSL (Windows Subsystem for Linux), make sure you are using the "Mirrored" networking mode. By default, WSL uses "NAT" (Network Address Translation) mode, which isolates your WSL environment behind a virtual network. As a result, services running inside WSL (such as Prakasa scheduler) are not directly accessible from other devices on the LAN.
To ensure that other machines on your network can connect to your WSL instance, change the WSL networking mode to "Mirrored" (supported on Windows 11 version 22H2 or later). In "Mirrored" mode, your WSL environment will share the same network as your host, allowing local network discovery and seamless joining of nodes to your Prakasa cluster.


