prakasa/docs/user_guide/quick_start.md at main · hetu-project/prakasa

Getting Started

We will walk through you the easiest way to quickly set up your own AI cluster.

If you have not installed Prakasa yet, please refer to the installation guide and follow the instructions.

With Frontend

Step 1: Launch scheduler

First launch our scheduler on the main node, we recommend you to use your most convenient computer for this.

For Linux/macOS:

prakasa run

For Windows, start Powershell console as administrator and run:

prakasa run

To allow the API to be accessible from other machines, add the argument --host 0.0.0.0 when launching scheduler.

prakasa run --host 0.0.0.0

When running prakasa run for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the -u flag:

prakasa run -u

Or, use the configure file to setup scheduler directly:

cp config.template.yaml my_config.yaml
# edit my_config.yaml to set your desired options
prakasa run -c {my-config-file-path}

Step 2: Set cluster and model config

Open http://localhost:3001 and you should see the setup interface.

Select your desired node and model config and click continue.

Note: When running in remote mode, Prakasa will use a public relay server to help establish connections between the scheduler and nodes. The public relay server will receive the IP information of both the scheduler and the nodes in order to facilitate this connection.

Step 3: Connect your nodes

Copy the generated join command line to your node and run. For remote connection, you can find your scheduler-address in the scheduler logs.

# local area network env
prakasa join
# public network env
prakasa join -s {scheduler-address}
# example
prakasa join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu

You should see your nodes start to show up with their status. Wait until all nodes are successfully connected, and you will automatically be directed to the chat interface.

When running prakasa join for the first time or after an update, the code version info might be sent to help improve the project. To disable this, use the -u flag:

prakasa join -u

Or, use the configure file to setup worker directly:

prakasa join -c {my-config-file-path}

Step 4: Chat

Done! You have your own AI cluster now.

Accessing the chat interface from another non-scheduler computer

You can access the chat interface from any non-scheduler computer, not just those running a node server. Simply start the chat server with:

# local area network env
prakasa chat
# public network env
prakasa chat -s {scheduler-address}
# example
prakasa chat -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu

After launching, visit http://localhost:3002 in your browser to use the chat interface.

To allow the API to be accessible from other machines, add the argument --host 0.0.0.0 when launching chat interface.

prakasa chat --host 0.0.0.0

Or, use the configure file to setup chat directly:

prakasa chat -c {my-config-file-path}

Without frontend

Step 1: Launch scheduler

First launch our scheduler on the main node.

prakasa run -m {model-name} -n {number-of-worker-nodes}

For example:

prakasa run -m Qwen/Qwen3-0.6B -n 2

Please notice and record the scheduler ip4 address generated in the terminal.

Step 2: Connect your nodes

For each distributed nodes including the main node, open a terminal and join the server with the scheduler address.

# local area network env
prakasa join
# public network env
prakasa join -s {scheduler-address}

For example:

# first node
prakasa join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu
# second node
prakasa join -s 12D3KooWLX7MWuzi1Txa5LyZS4eTQ2tPaJijheH8faHggB9SxnBu

Step 3: Call chat api with Scheduler

curl --location 'http://localhost:3001/v1/chat/completions' --header 'Content-Type: application/json' --data '{
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "hello"
      }
    ],
    "stream": true
}'

Note: For models such as Qwen3 and gpt-oss, the "reasoning" (or "thinking") feature is enabled by default. To disable it, add "chat_template_kwargs": {"enable_thinking": false} to your request payload.

Skipping Scheduler

Developers can start Prakasa backend engine without a scheduler. Pipeline parallel start/end layers should be set manually. An example of serving Qwen3-0.6B with 2-nodes:

First node:

python3 ./prakasa/src/prakasa/launch.py \
--model-path Qwen/Qwen3-0.6B \
--port 3000 \
--max-batch-size 8 \
--start-layer 0 \
--end-layer 14

Second node:

python3 ./prakasa/src/prakasa/launch.py \
--model-path Qwen/Qwen3-0.6B \
--port 3000 \
--max-batch-size 8 \
--start-layer 14 \
--end-layer 28

Call chat API on one of the nodes:

curl --location 'http://localhost:3000/v1/chat/completions' --header 'Content-Type: application/json' --data '{
    "max_tokens": 1024,
    "messages": [
      {
        "role": "user",
        "content": "hello"
      }
    ],
    "stream": true
}'

FAQ

Q: When deploying on cloud servers, I encounter an error like "lattica RPC call failed". What does this mean and how can I resolve it?

A: This error typically occurs when the necessary network ports for communication between the scheduler and nodes are blocked—most often due to firewall or security group settings on your cloud platform.

How to fix:

Ensure that the relevant TCP/UDP ports for both the scheduler and nodes are open and accessible between all machines in your cluster.
- By default, the scheduler uses HTTP port 3001, and nodes use HTTP port 3000. You can change these with the --port argument (e.g., prakasa run --port <your_port> or prakasa join --port <your_port>).
- For Lattica (node-to-node) communication, random ports are used by default. It is best to explicitly specify which TCP and UDP ports to use (e.g., --tcp-port <your_tcp_port> --udp-port <your_udp_port>), and then open those ports for inbound and outbound traffic in your cloud provider's security settings.
Check your cloud provider's firewall or network security group configurations:
1. Open inbound rules for the ports mentioned above on all scheduler and node machines.
2. Make sure that ports are open to the desired sources (e.g., to all cluster instances, or to your public IPs if required).

After updating the firewall/security group settings to allow these ports, restart your scheduler and nodes.

Q: When running on macOS, I encounter the error: error sending packet on iface address No route to host (os error 65) address=192.168.xxx.xxx. What does this mean and how can I fix it?

A: On macOS, you need to allow your terminal or IDE (such as Terminal, iTerm2, VS Code, Cursor, etc.) access to the local network in order for Prakasa to work correctly. If the application prompts you for network access the first time you run Prakasa, click "Allow." If you have already denied access, follow these steps to enable it:

Open System Settings from the Apple menu.
Click on Privacy & Security in the sidebar.
Click on Local Network.
For each app listed, turn the ability to access your local network on or off using the toggle switch.

This will ensure Prakasa has the proper network permissions for local communication.

Q: When running the scheduler on Windows, nodes on other PCs cannot detect the scheduler ID over the local network. Why can't other machines join the cluster?

A: If you are running Prakasa in WSL (Windows Subsystem for Linux), make sure you are using the "Mirrored" networking mode. By default, WSL uses "NAT" (Network Address Translation) mode, which isolates your WSL environment behind a virtual network. As a result, services running inside WSL (such as Prakasa scheduler) are not directly accessible from other devices on the LAN.

To ensure that other machines on your network can connect to your WSL instance, change the WSL networking mode to "Mirrored" (supported on Windows 11 version 22H2 or later). In "Mirrored" mode, your WSL environment will share the same network as your host, allowing local network discovery and seamless joining of nodes to your Prakasa cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Started

With Frontend

Step 1: Launch scheduler

Step 2: Set cluster and model config

Step 3: Connect your nodes

Step 4: Chat

Accessing the chat interface from another non-scheduler computer

Without frontend

Step 1: Launch scheduler

Step 2: Connect your nodes

Step 3: Call chat api with Scheduler

Skipping Scheduler

FAQ

FilesExpand file tree

quick_start.md

Latest commit

History

quick_start.md

File metadata and controls

Getting Started

With Frontend

Step 1: Launch scheduler

Step 2: Set cluster and model config

Step 3: Connect your nodes

Step 4: Chat

Accessing the chat interface from another non-scheduler computer

Without frontend

Step 1: Launch scheduler

Step 2: Connect your nodes

Step 3: Call chat api with Scheduler

Skipping Scheduler

FAQ