Skip to content

Comments

feat: add MCP node management and firmware update jobs#1465

Open
DTTerastar wants to merge 1 commit intomainfrom
feat/mcp-node-management-firmware
Open

feat: add MCP node management and firmware update jobs#1465
DTTerastar wants to merge 1 commit intomainfrom
feat/mcp-node-management-firmware

Conversation

@DTTerastar
Copy link
Collaborator

@DTTerastar DTTerastar commented Feb 16, 2026

Summary by CodeRabbit

Release Notes

New Features

  • Introduced firmware update job tracking system with real-time progress monitoring, detailed logs, and status history
  • Enhanced security: centralized firmware URL validation restricts downloads to trusted sources
  • Added job management capabilities: start, monitor, view logs, and cancel firmware updates
  • Extended device management: retrieve/update settings, trigger restarts, request self-updates, and manage node lifecycle

@DTTerastar DTTerastar temporarily deployed to CI - release environment February 16, 2026 15:59 — with GitHub Actions Inactive
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 16, 2026

📝 Walkthrough

Walkthrough

A new FirmwareUpdateJobService is introduced to manage firmware update operations with URL validation, job tracking, and async OTA execution. The service is registered in the DI container, integrated into FirmwareController and McpResources, and tested with unit tests for URL trust validation.

Changes

Cohort / File(s) Summary
Firmware Update Service Core
src/Services/FirmwareUpdateJobService.cs
New service implementing firmware update job lifecycle: validates trusted URLs (ESPresense GitHub only), manages concurrent update constraints per node, executes OTA updates asynchronously with progress tracking, handles cancellation, and maintains thread-safe job catalogs with logs and status transitions.
Service Integration
src/Program.cs, src/Controllers/FirmwareController.cs
DI registration of FirmwareUpdateJobService; FirmwareController constructor extended to accept and store the service; URL trust checks delegated to service via IsTrustedFirmwareUrl().
MCP Tools
src/Services/McpResources.cs
Constructor updated to accept NodeTelemetryStore and FirmwareUpdateJobService; nine new MCP server tools added: node settings management, restart, self-update request, deletion, and firmware job operations (start, get, list, cancel).
Tests
tests/ESPresense.Companion.Tests/Services/FirmwareUpdateJobServiceTests.cs
Unit tests for URL trust validation and invalid URL error handling; test helper creates mocked service instances.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant FirmwareController
    participant FirmwareUpdateJobService
    participant Node
    participant HTTP as HTTP (Firmware Source)

    Client->>FirmwareController: Start firmware update
    FirmwareController->>FirmwareUpdateJobService: Start(nodeId, url)
    
    rect rgba(100, 150, 200, 0.5)
        FirmwareUpdateJobService->>FirmwareUpdateJobService: Validate URL (trusted host?)
        FirmwareUpdateJobService->>FirmwareUpdateJobService: Check no active update for node
    end
    
    FirmwareUpdateJobService->>FirmwareUpdateJobService: Create FirmwareUpdateJob (Queued)
    FirmwareUpdateJobService-->>FirmwareController: Job created
    FirmwareController-->>Client: JobId
    
    par Async Execution
        FirmwareUpdateJobService->>FirmwareUpdateJobService: UpdateJob status to Running
        FirmwareUpdateJobService->>HTTP: GetFirmware(url)
        HTTP-->>FirmwareUpdateJobService: Binary/ZIP payload
        FirmwareUpdateJobService->>FirmwareUpdateJobService: Extract firmware from ZIP if needed
        
        FirmwareUpdateJobService->>Node: Toggle Arduino mode ON
        FirmwareUpdateJobService->>Node: OTA update (TCP to node IP)
        Node-->>FirmwareUpdateJobService: Update progress/complete
        FirmwareUpdateJobService->>FirmwareUpdateJobService: Log progress with timestamp
        FirmwareUpdateJobService->>Node: Toggle Arduino mode OFF
        
        rect rgba(150, 200, 150, 0.5)
            FirmwareUpdateJobService->>FirmwareUpdateJobService: UpdateJob status (Succeeded/Failed/Cancelled)
            FirmwareUpdateJobService->>FirmwareUpdateJobService: Cleanup resources and active mapping
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • Enhancements to firmware updating #1038: Modifies firmware download/update logic in FirmwareController by adding centralized URL trust validation through FirmwareUpdateJobService and updates constructor signature.

Suggested labels

codex

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding MCP node management tools and firmware update job capabilities across multiple services and controllers.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/mcp-node-management-firmware

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Fix all issues with AI agents
In `@src/Services/FirmwareUpdateJobService.cs`:
- Around line 46-52: The IsTrustedFirmwareUrl method currently does a
case-sensitive prefix check which can incorrectly reject or accept URLs due to
RFC 3986 case-insensitive scheme/host; update the check to perform a
case-insensitive comparison by using url.StartsWith(prefix,
StringComparison.OrdinalIgnoreCase) for each entry in TrustedFirmwarePrefixes
(or normalize the url to a canonical case before checking) so scheme and host
mismatches are handled correctly; ensure this change is applied where
TrustedFirmwarePrefixes.Any(url.StartsWith) is used to locate the logic in
IsTrustedFirmwareUrl.
- Around line 139-145: The job.CompletedAt timestamp is being set twice: once
inside UpdateJob when transitioning to terminal statuses and again
unconditionally in the finally block; remove the redundant assignment in the
finally block (delete the line "job.CompletedAt = DateTime.UtcNow" in the
finally) so UpdateJob remains the single authoritative setter for terminal
states, while keeping the remaining cleanup calls (_tokens.TryRemove(...),
cts?.Dispose(), _activeJobByNode.TryRemove(...)) intact; ensure no other code
expects the finally block to set CompletedAt.
- Around line 196-207: The method currently constructs ZipArchive with default
leaveOpen causing ms1 to be disposed when zipArchive is disposed, which can
return a closed stream for empty zips; change the ZipArchive creation to
preserve ms1 by calling new ZipArchive(ms1, ZipArchiveMode.Read, leaveOpen:
true) (i.e., replace using var zipArchive = new ZipArchive(ms1) with using var
zipArchive = new ZipArchive(ms1, ZipArchiveMode.Read, leaveOpen: true)) so ms1
remains usable when returned; keep existing logic that creates and returns ms2
for the first entry.
- Around line 54-87: The Start method has a TOCTOU race where ContainsKey +
assignment on _activeJobByNode can let two concurrent starts for the same node
succeed; replace that pattern with an atomic TryAdd on _activeJobByNode using
the generated jobId (call _activeJobByNode.TryAdd(nodeId, jobId)) to fail fast
if another job exists, and if subsequent steps (_jobs.TryAdd or _tokens[jobId] =
cts) fail, roll back by removing the entry from _activeJobByNode (and any
partially added entries in _jobs/_tokens) so no leaks remain; update Start to
use TryAdd for the check-and-set and perform cleanup on any intermediate
failure, leaving ExecuteAsync unchanged.

In `@src/Services/McpResources.cs`:
- Around line 162-170: RequestNodeUpdateTool currently forwards a non-null url
directly to _nsd.Update allowing arbitrary URLs; before calling _nsd.Update in
RequestNodeUpdateTool validate the url by calling
_firmwareUpdateJobs.IsTrustedFirmwareUrl(url) (same logic used by
StartFirmwareUpdateTool) and return an error/throw if the url is not trusted,
otherwise proceed to call _nsd.Update(nodeId, url) and return the success JSON;
ensure you reference RequestNodeUpdateTool, _nsd.Update, and
_firmwareUpdateJobs.IsTrustedFirmwareUrl when making the change.
🧹 Nitpick comments (8)
tests/ESPresense.Companion.Tests/Services/FirmwareUpdateJobServiceTests.cs (3)

7-7: Consider adding the [TestFixture] attribute.

NUnit 3.x will discover the tests without it, but the attribute is conventional and ensures NUnit analyzers don't flag the class.

+[TestFixture]
 public class FirmwareUpdateJobServiceTests

9-23: HttpClient is not disposed after each test.

The CreateService() helper creates a new HttpClient per call without disposal. While this won't cause issues in these specific tests (none reach HTTP), it's good hygiene to either share a single instance or dispose via IDisposable / setup-teardown.


43-52: Consider expanding test coverage for additional Start paths.

Currently only the untrusted-URL error path is tested. Key scenarios to cover:

  • Start with empty/null nodeId"nodeId is required"
  • Start with empty/null url"url is required"
  • Start with a valid URL for a node that already has an active job → duplicate prevention
  • Successful job creation (verifying returned job is non-null and fields are set)
src/Controllers/FirmwareController.cs (1)

117-152: GetFirmware is duplicated between the controller and FirmwareUpdateJobService.

FirmwareController.GetFirmware (lines 117–152) and FirmwareUpdateJobService.GetFirmware (lines 177–208 in the service) contain nearly identical download-and-unzip logic. Consider extracting this into a shared helper to avoid divergence over time.

src/Services/FirmwareUpdateJobService.cs (2)

85-85: Fire-and-forget Task.Run — no graceful shutdown support.

The spawned task is never tracked, so there's no way to await in-flight firmware updates during application shutdown. If the host shuts down mid-OTA, the node could be left in Arduino/bootloader mode (line 111) without the finally block reliably executing.

Consider tracking running tasks (e.g., via a ConcurrentBag<Task>) and draining them in IHostApplicationLifetime.ApplicationStopping, or implementing IHostedService for coordinated shutdown.


211-228: Data classes: consider separating into their own files.

FirmwareUpdateJob, FirmwareUpdateLogEntry, and FirmwareUpdateJobStatus are public types colocated in the service file. For discoverability and consistency with the project's file-per-type convention, consider moving them to a Models/ directory.

src/Services/McpResources.cs (2)

142-151: SetNodeSettingsTool accepts arbitrary settings without validation.

Any MCP client can overwrite node settings with any values. While this may be acceptable given the trust model, consider whether certain fields (e.g., sensitive network configs) should be restricted or at least logged at warning level for auditability.


133-140: Consider null handling for GetNodeSettingsTool.

If _nsd.Get(nodeId) returns null for an unknown node, the tool returns "null" as JSON. Consider returning a structured error response (consistent with other tools' { ok: false, error: ... } pattern) for better MCP client ergonomics.

Comment on lines +46 to +52
public bool IsTrustedFirmwareUrl(string url)
{
if (string.IsNullOrWhiteSpace(url))
return false;

return TrustedFirmwarePrefixes.Any(url.StartsWith);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

URL prefix check is case-sensitive — scheme/host bypass possible.

string.StartsWith with current culture is case-sensitive for ASCII. A URL like HTTPS://GitHub.com/ESPresense/... (valid per RFC 3986 for scheme and host) would be rejected. Consider using StringComparison.OrdinalIgnoreCase or normalizing the URL first.

Proposed fix
-        return TrustedFirmwarePrefixes.Any(url.StartsWith);
+        return TrustedFirmwarePrefixes.Any(prefix => url.StartsWith(prefix, StringComparison.OrdinalIgnoreCase));
🤖 Prompt for AI Agents
In `@src/Services/FirmwareUpdateJobService.cs` around lines 46 - 52, The
IsTrustedFirmwareUrl method currently does a case-sensitive prefix check which
can incorrectly reject or accept URLs due to RFC 3986 case-insensitive
scheme/host; update the check to perform a case-insensitive comparison by using
url.StartsWith(prefix, StringComparison.OrdinalIgnoreCase) for each entry in
TrustedFirmwarePrefixes (or normalize the url to a canonical case before
checking) so scheme and host mismatches are handled correctly; ensure this
change is applied where TrustedFirmwarePrefixes.Any(url.StartsWith) is used to
locate the logic in IsTrustedFirmwareUrl.

Comment on lines +54 to +87
public (FirmwareUpdateJob? Job, string? Error) Start(string nodeId, string url)
{
if (string.IsNullOrWhiteSpace(nodeId))
return (null, "nodeId is required");

if (string.IsNullOrWhiteSpace(url))
return (null, "url is required");

if (!IsTrustedFirmwareUrl(url))
return (null, "Only ESPresense GitHub URLs are allowed");

if (_activeJobByNode.ContainsKey(nodeId))
return (null, $"A firmware update is already running for node '{nodeId}'");

var jobId = Guid.NewGuid().ToString("N");
var job = new FirmwareUpdateJob
{
JobId = jobId,
NodeId = nodeId,
Url = url,
Status = FirmwareUpdateJobStatus.Queued,
CreatedAt = DateTime.UtcNow
};

if (!_jobs.TryAdd(jobId, job))
return (null, "Unable to create update job");

_activeJobByNode[nodeId] = jobId;
var cts = new CancellationTokenSource();
_tokens[jobId] = cts;

_ = Task.Run(() => ExecuteAsync(job, cts.Token), CancellationToken.None);
return (job, null);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Race condition (TOCTOU): concurrent Start calls for the same node can both pass the guard.

ContainsKey on line 65 and _activeJobByNode[nodeId] = jobId on line 81 are not atomic. Two concurrent calls with the same nodeId can both pass the check and start duplicate jobs.

Use TryAdd for an atomic check-and-set:

🐛 Proposed fix
-        if (_activeJobByNode.ContainsKey(nodeId))
-            return (null, $"A firmware update is already running for node '{nodeId}'");
-
         var jobId = Guid.NewGuid().ToString("N");
         var job = new FirmwareUpdateJob
         {
@@ 
         if (!_jobs.TryAdd(jobId, job))
             return (null, "Unable to create update job");
 
-        _activeJobByNode[nodeId] = jobId;
+        if (!_activeJobByNode.TryAdd(nodeId, jobId))
+        {
+            _jobs.TryRemove(jobId, out _);
+            return (null, $"A firmware update is already running for node '{nodeId}'");
+        }
+
         var cts = new CancellationTokenSource();
🤖 Prompt for AI Agents
In `@src/Services/FirmwareUpdateJobService.cs` around lines 54 - 87, The Start
method has a TOCTOU race where ContainsKey + assignment on _activeJobByNode can
let two concurrent starts for the same node succeed; replace that pattern with
an atomic TryAdd on _activeJobByNode using the generated jobId (call
_activeJobByNode.TryAdd(nodeId, jobId)) to fail fast if another job exists, and
if subsequent steps (_jobs.TryAdd or _tokens[jobId] = cts) fail, roll back by
removing the entry from _activeJobByNode (and any partially added entries in
_jobs/_tokens) so no leaks remain; update Start to use TryAdd for the
check-and-set and perform cleanup on any intermediate failure, leaving
ExecuteAsync unchanged.

Comment on lines +139 to +145
finally
{
job.CompletedAt = DateTime.UtcNow;
_tokens.TryRemove(job.JobId, out var cts);
cts?.Dispose();
_activeJobByNode.TryRemove(job.NodeId, out _);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

CompletedAt is written twice for terminal states.

UpdateJob (line 156) sets CompletedAt for terminal statuses, and then the finally block (line 141) unconditionally overwrites it. Remove one of the two assignments to avoid the subtle inconsistency.

Proposed fix — remove the redundant assignment in `finally`
         finally
         {
-            job.CompletedAt = DateTime.UtcNow;
             _tokens.TryRemove(job.JobId, out var cts);
             cts?.Dispose();
             _activeJobByNode.TryRemove(job.NodeId, out _);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
finally
{
job.CompletedAt = DateTime.UtcNow;
_tokens.TryRemove(job.JobId, out var cts);
cts?.Dispose();
_activeJobByNode.TryRemove(job.NodeId, out _);
}
finally
{
_tokens.TryRemove(job.JobId, out var cts);
cts?.Dispose();
_activeJobByNode.TryRemove(job.NodeId, out _);
}
🤖 Prompt for AI Agents
In `@src/Services/FirmwareUpdateJobService.cs` around lines 139 - 145, The
job.CompletedAt timestamp is being set twice: once inside UpdateJob when
transitioning to terminal statuses and again unconditionally in the finally
block; remove the redundant assignment in the finally block (delete the line
"job.CompletedAt = DateTime.UtcNow" in the finally) so UpdateJob remains the
single authoritative setter for terminal states, while keeping the remaining
cleanup calls (_tokens.TryRemove(...), cts?.Dispose(),
_activeJobByNode.TryRemove(...)) intact; ensure no other code expects the
finally block to set CompletedAt.

Comment on lines +196 to +207
using var zipArchive = new ZipArchive(ms1);
foreach (var entry in zipArchive.Entries)
{
var ms2 = new MemoryStream();
await using var entryStream = entry.Open();
await entryStream.CopyToAsync(ms2, ct);
ms2.Position = 0;
return ms2;
}

ms1.Position = 0;
return ms1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Empty-zip edge case: caller receives a disposed MemoryStream.

If the downloaded content is detected as a zip but contains zero entries, zipArchive disposes ms1 (since leaveOpen defaults to false) when exiting method scope, and the caller receives a closed stream. This is an unlikely scenario but will cause ObjectDisposedException at read time.

Pass leaveOpen: true to preserve ms1:

Proposed fix
-        using var zipArchive = new ZipArchive(ms1);
+        using var zipArchive = new ZipArchive(ms1, ZipArchiveMode.Read, leaveOpen: true);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
using var zipArchive = new ZipArchive(ms1);
foreach (var entry in zipArchive.Entries)
{
var ms2 = new MemoryStream();
await using var entryStream = entry.Open();
await entryStream.CopyToAsync(ms2, ct);
ms2.Position = 0;
return ms2;
}
ms1.Position = 0;
return ms1;
using var zipArchive = new ZipArchive(ms1, ZipArchiveMode.Read, leaveOpen: true);
foreach (var entry in zipArchive.Entries)
{
var ms2 = new MemoryStream();
await using var entryStream = entry.Open();
await entryStream.CopyToAsync(ms2, ct);
ms2.Position = 0;
return ms2;
}
ms1.Position = 0;
return ms1;
🤖 Prompt for AI Agents
In `@src/Services/FirmwareUpdateJobService.cs` around lines 196 - 207, The method
currently constructs ZipArchive with default leaveOpen causing ms1 to be
disposed when zipArchive is disposed, which can return a closed stream for empty
zips; change the ZipArchive creation to preserve ms1 by calling new
ZipArchive(ms1, ZipArchiveMode.Read, leaveOpen: true) (i.e., replace using var
zipArchive = new ZipArchive(ms1) with using var zipArchive = new ZipArchive(ms1,
ZipArchiveMode.Read, leaveOpen: true)) so ms1 remains usable when returned; keep
existing logic that creates and returns ms2 for the first entry.

Comment on lines +162 to +170
[McpServerTool(Name = "request_node_update")]
[Description("Request node self-update with optional firmware URL")]
public async Task<string> RequestNodeUpdateTool(
[Description("Node identifier")] string nodeId,
[Description("Optional firmware URL")] string? url = null)
{
await _nsd.Update(nodeId, url);
return JsonSerializer.Serialize(new { ok = true, nodeId, url }, _jsonOptions);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Missing URL trust validation in RequestNodeUpdateTool.

When a non-null url is provided, it is passed directly to _nsd.Update(nodeId, url) without calling _firmwareUpdateJobs.IsTrustedFirmwareUrl(url). This bypasses the URL allowlist enforced everywhere else (controller, StartFirmwareUpdateTool), allowing an MCP client to instruct a node to self-update from an arbitrary URL.

🛡️ Proposed fix
     public async Task<string> RequestNodeUpdateTool(
         [Description("Node identifier")] string nodeId,
         [Description("Optional firmware URL")] string? url = null)
     {
+        if (url != null && !_firmwareUpdateJobs.IsTrustedFirmwareUrl(url))
+            return JsonSerializer.Serialize(new { ok = false, error = "Only ESPresense GitHub URLs are allowed" }, _jsonOptions);
+
         await _nsd.Update(nodeId, url);
         return JsonSerializer.Serialize(new { ok = true, nodeId, url }, _jsonOptions);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[McpServerTool(Name = "request_node_update")]
[Description("Request node self-update with optional firmware URL")]
public async Task<string> RequestNodeUpdateTool(
[Description("Node identifier")] string nodeId,
[Description("Optional firmware URL")] string? url = null)
{
await _nsd.Update(nodeId, url);
return JsonSerializer.Serialize(new { ok = true, nodeId, url }, _jsonOptions);
}
[McpServerTool(Name = "request_node_update")]
[Description("Request node self-update with optional firmware URL")]
public async Task<string> RequestNodeUpdateTool(
[Description("Node identifier")] string nodeId,
[Description("Optional firmware URL")] string? url = null)
{
if (url != null && !_firmwareUpdateJobs.IsTrustedFirmwareUrl(url))
return JsonSerializer.Serialize(new { ok = false, error = "Only ESPresense GitHub URLs are allowed" }, _jsonOptions);
await _nsd.Update(nodeId, url);
return JsonSerializer.Serialize(new { ok = true, nodeId, url }, _jsonOptions);
}
🤖 Prompt for AI Agents
In `@src/Services/McpResources.cs` around lines 162 - 170, RequestNodeUpdateTool
currently forwards a non-null url directly to _nsd.Update allowing arbitrary
URLs; before calling _nsd.Update in RequestNodeUpdateTool validate the url by
calling _firmwareUpdateJobs.IsTrustedFirmwareUrl(url) (same logic used by
StartFirmwareUpdateTool) and return an error/throw if the url is not trusted,
otherwise proceed to call _nsd.Update(nodeId, url) and return the success JSON;
ensure you reference RequestNodeUpdateTool, _nsd.Update, and
_firmwareUpdateJobs.IsTrustedFirmwareUrl when making the change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant