[cuebot/pycue/proto] Add render farm monitoring system with Kafka, Elasticsearch, and enhanced Prometheus metrics

**Describe the enhancement**
This enhancement proposes a monitoring infrastructure for collecting, storing, and providing access to render farm statistics. The system would address several current limitations:

1. Limited Historical Data Access: The existing pycue API only provides 3 days of job history due to database recycling policies, limiting long-term analysis and memory prediction capabilities.
2. Insufficient Real-time Visibility: Production Support and Resources (PSR) Teams lack comprehensive real-time views into farm operations for resource forecasting and troubleshooting.
3. Fragmented Monitoring Solutions: Multiple disparate systems create maintenance overhead and data silos.

### Proposed Architecture

The solution would introduce an event-driven architecture with the following components:

- Kafka Event Publishing: Real-time capture and publishing of Job/Layer/Frame/Host lifecycle events to Kafka topics
- Elasticsearch Storage: Long-term historical data storage (1-2 year retention) for analytical queries
- Enhanced Prometheus Metrics: Extended metrics for frame completion rates, runtime histograms, and memory usage patterns
- Extended pycue API: New methods for querying historical data beyond the current 3-day limitation

### Proposed Features

| Feature            | Description                                                 |
|--------------------|-------------------------------------------------------------|
| Event Types        | Job, Layer, Frame, Host, and Proc lifecycle events          |
| Historical Queries | getJobHistory(), getFrameHistory(), getLayerMemoryHistory() |
| Prometheus Metrics | Frame/job completion counters, runtime/memory histograms    |
| Configuration      | Fully opt-in via properties (disabled by default)           |

### Use Cases

1. Enhanced Memory Prediction: Access up to 1 year of historical job data for improved DCCs (e.g. Nuke) memory prediction accuracy
2. Production Support and Resources (PSR) Teams Operational Dashboard: Real-time farm status, resource forecasting, and capacity planning
3. Analytics: Long-term trend analysis for render farm optimization

### Proposed Configuration Properties

```
# Kafka Event Publishing
monitoring.kafka.enabled=false
monitoring.kafka.bootstrap.servers=localhost:9092

# Elasticsearch Historical Storage
monitoring.elasticsearch.enabled=false
monitoring.elasticsearch.host=localhost
monitoring.elasticsearch.port=9200
```

**Version Number**
- https://github.com/AcademySoftwareFoundation/OpenCue/releases/tag/v1.13.8

**Additional context**

### Implementation Phases

1. Phase 1 (Foundation): Kafka infrastructure, Cuebot event generation, basic Elasticsearch setup
2. Phase 2 (Storage Integration): Elasticsearch schema, data ingestion pipelines, Prometheus integration
3. Phase 3 (API Enhancement): Extended pycue API, performance optimization
4. Phase 4 (Visualization): Grafana dashboard development, end-to-end testing


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[cuebot/pycue/proto] Add render farm monitoring system with Kafka, Elasticsearch, and enhanced Prometheus metrics #2085

Proposed Architecture

Proposed Features

Use Cases

Proposed Configuration Properties

Implementation Phases

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature	Description
Event Types	Job, Layer, Frame, Host, and Proc lifecycle events
Historical Queries	getJobHistory(), getFrameHistory(), getLayerMemoryHistory()
Prometheus Metrics	Frame/job completion counters, runtime/memory histograms
Configuration	Fully opt-in via properties (disabled by default)

[cuebot/pycue/proto] Add render farm monitoring system with Kafka, Elasticsearch, and enhanced Prometheus metrics #2085

Description

Proposed Architecture

Proposed Features

Use Cases

Proposed Configuration Properties

Implementation Phases

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions