Support for Heterogeneous (Mixed AMD + NVIDIA) GPU Performance Monitoring

### Description of the Feature / Problem
In a mixed-brand GPU setup (e.g., a system running an AMD Radeon AI Pro R9700 and an NVIDIA GeForce RTX 3060 Ti simultaneously), the experimental performance monitor cannot track or graph both vendors. 

Gemini 3.5 Flash and I investigated the codebase and discovered two main blockers:

1. **First-Match-Wins Sequential Discovery:**
   The Unix GPU discovery loop in `internal/perf/monitor_unix.go` executes sequentially and returns as soon as the first tool succeeds (LACT -> `nvidia-smi` -> `rocm-smi` -> `sysfs`). In a heterogeneous setup, `rocm-smi` is never initialized because `tryNvidiaSmi` succeeds first.
2. **Device ID Overlap Collision:**
   Both `nvidia-smi` and `rocm-smi` default to starting device index mapping at `0`. When both metrics are gathered, they both arrive with `id: 0`. The Svelte frontend (`buildGpuDatasets`) groups stats strictly by `g.id` into a Svelte `Map`. Because they share the same ID, Svelte interleaves the data points of both cards into a single dataset, causing the RTX 3060 Ti chart line to zig-zag wildly between NVIDIA and AMD metrics, while the AMD card is hidden from the legend.

---

### Proposed Solution

#### 1. Dynamic WaitGroup Channel Multiplexer
We updated `getGpuStats` to append all active GPU channels and multiplex their output streams into a single merged channel using a standard Go `sync.WaitGroup` goroutine. This allows both monitoring loops to run in parallel and stream their metrics concurrently.

#### 2. AMD GPU ID Offsetting
To prevent Svelte's charting map from colliding, we offset the AMD GPU ID by `+100` during line parsing in `parseRocmSmiLine` (mapping `card0` to ID `100`, `card1` to `101`, etc.). 

*Note: While adding a static `+100` offset is a highly pragmatic, "dirty" local workaround rather than a mathematically elegant long-term solution (which should probably assign stable UUIDs or parse vendor-specific indices cleanly), it successfully isolates the datasets for the Svelte frontend, plotting both GPUs as separate independent lines in real time.*

---

### 🛠️ Tested Patch (`internal/perf/monitor_unix.go`)

This patch compiles cleanly, introduces no external dependencies, and passes all existing package unit tests:

```diff
diff --git a/internal/perf/monitor_unix.go b/internal/perf/monitor_unix.go
index a61163a..e744e72 100644
--- a/internal/perf/monitor_unix.go
+++ b/internal/perf/monitor_unix.go
@@ -14,6 +14,7 @@ import (
 	"path/filepath"
 	"strconv"
 	"strings"
+	"sync"
 	"time"
 
 	"github.com/mostlygeek/llama-swap/internal/logmon"
@@ -24,35 +25,66 @@ import (
 )
 
 func getGpuStats(ctx context.Context, every time.Duration, logger *logmon.Monitor) (chan []GpuStat, error) {
+	var channels []chan []GpuStat
+
 	if ch, err := tryLACT(ctx, every, logger); err == nil {
 		logger.Info("using LACT for GPU monitoring")
-		return ch, nil
+		channels = append(channels, ch)
 	} else {
 		logger.Debugf("LACT: %s", err.Error())
 	}
 
 	if ch, err := tryNvidiaSmi(ctx, every, logger); err == nil {
 		logger.Info("using nvidia-smi for GPU monitoring")
-		return ch, nil
+		channels = append(channels, ch)
 	} else {
 		logger.Debugf("nvidia-smi: %s", err.Error())
 	}
 
 	if ch, err := tryRocmSmi(ctx, every, logger); err == nil {
 		logger.Info("using rocm-smi for GPU monitoring")
-		return ch, nil
+		channels = append(channels, ch)
 	} else {
 		logger.Debugf("rocm-smi: %s", err.Error())
 	}
 
 	if ch, err := trySysfs(ctx, every, logger); err == nil {
 		logger.Info("using sysfs for GPU monitoring")
-		return ch, nil
+		channels = append(channels, ch)
 	} else {
 		logger.Debugf("sysfs: %s", err.Error())
 	}
 
-	return nil, ErrNoGpuTool
+	if len(channels) == 0 {
+		return nil, ErrNoGpuTool
+	}
+
+	if len(channels) == 1 {
+		return channels[0], nil
+	}
+
+	mergedCh := make(chan []GpuStat, len(channels))
+	var wg sync.WaitGroup
+	for _, ch := range channels {
+		wg.Add(1)
+		go func(c chan []GpuStat) {
+			defer wg.Done()
+			for g := range c {
+				select {
+				case <-ctx.Done():
+					return
+				case mergedCh <- g:
+				}
+			}
+		}(ch)
+	}
+
+	go func() {
+		wg.Wait()
+		close(mergedCh)
+	}()
+
+	return mergedCh, nil
 }
 
 func tryLACT(ctx context.Context, every time.Duration, logger *logmon.Monitor) (chan []GpuStat, error) {
@@ -280,7 +312,7 @@ func parseRocmSmiLine(header string, line string) *GpuStat {
 			if err != nil {
 				return nil
 			}
-			result.ID = id
+			result.ID = id + 100
 		case "Device Name":
 			deviceName = val
 		case "GUID":
```

### 📊 Performance Dashboard Result
With the patch active, both GPUs are successfully separated and graphed in parallel:

![llama-swap heterogeneous multi-gpu performance chart](https://github.com/user-attachments/assets/cc00948e-1e7f-4e57-8079-ba5c8749fe51)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Heterogeneous (Mixed AMD + NVIDIA) GPU Performance Monitoring #803

Description of the Feature / Problem

Proposed Solution

1. Dynamic WaitGroup Channel Multiplexer

2. AMD GPU ID Offsetting

🛠️ Tested Patch (`internal/perf/monitor_unix.go`)

📊 Performance Dashboard Result

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Support for Heterogeneous (Mixed AMD + NVIDIA) GPU Performance Monitoring #803

Description

Description of the Feature / Problem

Proposed Solution

1. Dynamic WaitGroup Channel Multiplexer

2. AMD GPU ID Offsetting

🛠️ Tested Patch (internal/perf/monitor_unix.go)

📊 Performance Dashboard Result

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

🛠️ Tested Patch (`internal/perf/monitor_unix.go`)