This Zabbix template enables full monitoring of a Proxmox VE environment via the official REST API (Proxmox VE 7.0+). No Zabbix agent is required inside VMs or on the PVE host. It collects host and cluster metrics, VM and LXC container data, backup jobs, storage status, tasks, network interfaces, HA resources, disk health, and user accounts.
Works on standalone single-node setups as well as full clusters.
- Zabbix Server 7.0 or higher
- Proxmox VE 7.0 or higher
- API token with read permissions (see setup below)
-
Create a user (skip if using
root@pam)- Datacenter → Permissions → Users → Add
- User:
zabbix@pam, set a password → Add
-
Assign read-only role to the user
- Datacenter → Permissions → Add → User Permission
- Path:
/· User:zabbix@pam· Role:PVEAuditor· Propagate: ✓ → Add
-
Create the API token
- Datacenter → Permissions → API Tokens → Add
- User:
zabbix@pam· Token ID:Zabbix· Privilege Separation: disabled → Add - Copy the token secret — it is shown only once.
The token inherits all permissions from the user. Header format:
PVEAPIToken=zabbix@pam!Zabbix=<token-secret>
-
Follow steps 1–2 from Option A.
-
Create the API token
- Datacenter → Permissions → API Tokens → Add
- User:
zabbix@pam· Token ID:Zabbix· Privilege Separation: enabled → Add
-
Grant permission to the token explicitly
- Datacenter → Permissions → Add → API Token Permission
- Path:
/· Token:zabbix@pam!Zabbix· Role:PVEAuditor· Propagate: ✓ → Add
Note for disk monitoring:
/nodes/{node}/disks/listrequires theSys.Auditprivilege.PVEAuditorincludes this privilege. If disk items show "not supported", verify that the role is applied with Propagate enabled and that the token has the correct path/.
- Download
template_proxmox-ve-rest-api.yaml - In Zabbix: Data collection → Templates → Import
- Create a new host:
- Data collection → Hosts → Create host
- Host name: e.g.
proxmox01 - Template:
Template Proxmox VE REST API - Group: e.g.
Virtual machines - Interfaces: leave empty (template uses HTTP agent, no Zabbix agent needed)
- Set the required macros on the host (see below)
| Macro | Example | Description |
|---|---|---|
{$PVE_IP} |
192.168.1.10 |
IP address or hostname of the PVE server |
{$PVE_PORT} |
8006 |
API port (default: 8006) |
{$PVE_NODE} |
pve |
Node name as shown in PVE (Datacenter → Node) |
{$PVE_API_USER} |
zabbix@pam |
API user including realm |
{$PVE_API_TOKEN_ID} |
Zabbix |
Token ID |
{$PVE_API_TOKEN} |
(secret) | Token secret — set as Secret text macro type |
| Macro | Default | Description |
|---|---|---|
{$CPU_USAGE_AVERAGE} |
85 |
CPU warning threshold (%) |
{$CPU_USAGE_HIGH} |
99 |
CPU critical threshold (%) |
{$LXC.CPU.WARN} |
85 |
LXC CPU warning threshold (%) |
{$LXC.CPU.HIGH} |
99 |
LXC CPU critical threshold (%) |
{$MEMORY.UTIL.MAX} |
90 |
Memory warning threshold (%) |
{$ROOTFS.UTIL.WARN} |
90 |
Root filesystem warning threshold (%) |
{$ROOTFS.UTIL.CRIT} |
95 |
Root filesystem critical threshold (%) |
{$STORAGE.UTIL.WARN} |
80 |
Storage pool warning threshold (%) |
{$STORAGE.UTIL.CRIT} |
90 |
Storage pool critical threshold (%) |
{$CLUSTER.NODES.OFFLINE.MAX} |
0 |
Max. tolerated offline nodes (raise during maintenance) |
{$DISK.WEAROUT.MIN} |
20 |
Min. SSD wearout remaining before warning (%) |
{$PVE.USER.EXPIRE.TIME} |
172800 |
Seconds before user expiry to warn (172800 = 2 days) |
Set to 0 to suppress a trigger globally. Supports context macros for per-instance suppression.
| Macro | Default | Description |
|---|---|---|
{$ENABLE_BACKUP_ALERT} |
1 |
Backup failure trigger |
{$ENABLE_NODE_STATUS_ALERT} |
1 |
Node offline trigger |
{$ENABLE_STORAGE_AVAILABLE_ALERT} |
1 |
Storage high usage trigger |
{$ENABLE_STORAGE_INACTIVE_ALERT} |
1 |
Storage inactive trigger |
{$ENABLE_TASK_ALERT} |
1 |
Task failure trigger |
{$ENABLE_VM_STOP_ALERT} |
1 |
VM/LXC stopped trigger |
| Rule | Source | Discovers |
|---|---|---|
discover.lxc |
/nodes/{node}/lxc |
LXC containers with CPU, memory, disk, network metrics |
discover.qemu |
/nodes/{node}/qemu |
QEMU/KVM VMs with CPU, memory, disk, network metrics |
discover.nodes |
/nodes |
Cluster nodes with status and uptime |
discover.storage |
/nodes/{node}/storage |
Storage pools with capacity and active status |
discover.backup |
/nodes/{node}/tasks |
Backup jobs (vzdump/PBS), grouped by VM, most recent run |
discover.tasks |
/nodes/{node}/tasks |
Non-backup tasks, deduplicated per type |
discover.users |
/access/users |
PVE user accounts with expiration monitoring |
discover.network |
/nodes/{node}/network |
Host network interfaces (bridge, bond, eth, vlan) |
discover.ha.resources |
/cluster/ha/resources |
HA-protected VMs and containers |
discover.disks |
/nodes/{node}/disks/list |
Physical disks with SMART health and wearout |
| Trigger | Severity | Description |
|---|---|---|
| PVE API not reachable | Average | No data from API for 5 minutes |
| High CPU usage (>90%) | Average | PVE host CPU sustained high |
| High load average | Average | Load average ≥ number of CPUs |
| High memory usage | Average | Configurable via {$MEMORY.UTIL.MAX} |
| High root filesystem usage | Average / High | Two-level: warn and critical |
| Cluster lost quorum | Disaster | Only fires on actual clusters, not standalone nodes |
| Cluster nodes offline | High | Configurable tolerance via {$CLUSTER.NODES.OFFLINE.MAX} |
| VMs/LXC not all running | Info | Cluster-wide: running count < total count |
| Trigger | Severity |
|---|---|
| CPU over threshold for 5 minutes | Average / High |
| Memory utilization over threshold | Warning |
| VM/LXC stopped | High |
| VM/LXC restarted (uptime < 10 min) | Info |
| Trigger | Severity |
|---|---|
| Storage inactive/unavailable | Average |
| Storage usage over warning threshold | Average |
| Storage usage over critical threshold | High |
| Trigger | Severity |
|---|---|
| Backup failed | High |
| Task failed | Warning |
| User account expiring within 2 days | Warning |
| Node offline | High |
| Network interface down | Warning |
| HA resource in error state | High |
| Disk SMART health not PASSED | High |
| SSD wearout below threshold | Warning |
The template includes a pre-built dashboard "Proxmox VE – Monitoring Dashboard" with the following pages:
| Page | Contents |
|---|---|
| Overview | Version, Uptime, CPU%, Memory%, Cluster status, VMs running/total, active Problems |
| PVE | RootFS graph, Load Average (time-series), CPU and Memory graphs |
| Storage | Utilization pie charts, usage % trend, active status |
| QEMU/KVM-VMs | CPU, memory, disk I/O, network, status per VM |
| LXC - Container | CPU, memory, swap, disk I/O, network, status per container |
| Backup | Backup status per VM |
| Nodes | Node status and uptime |
| Cluster | Cluster name, quorum, nodes online/total, VMs running/total, problems |
| HA & Disks | Network interface status, HA resource states |
| Tasks | Task status per type |
| Network | VM and LXC network I/O (current and cumulative) |
- Single-node without cluster: Fully supported.
pve.cluster.quorumreturns1andpve.cluster.namereturnsstandalone— the quorum-lost trigger will not fire. - Disk monitoring: Requires
Sys.Auditprivilege. If disk items show "not supported", check that the API token role is applied with Propagate enabled at path/. - HA monitoring: Only relevant if PVE HA is configured. If no HA resources exist, discovery returns nothing.
- CPU temperatures: Not available through the PVE REST API. Requires an agent or custom script.