Skip to content

Conversation

@dmitsh
Copy link
Collaborator

@dmitsh dmitsh commented Jan 27, 2026

No description provided.

greptile-apps[bot]

This comment was marked as outdated.

@codecov
Copy link

codecov bot commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 93.33333% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.83%. Comparing base (2d0e14e) to head (d3424c9).

Files with missing lines Patch % Lines
pkg/translate/block.go 70.00% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #202      +/-   ##
==========================================
+ Coverage   65.55%   65.83%   +0.27%     
==========================================
  Files          81       81              
  Lines        4448     4484      +36     
==========================================
+ Hits         2916     2952      +36     
  Misses       1419     1419              
  Partials      113      113              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 27, 2026

Additional Comments (1)

pkg/translate/block.go
Counts total nodes including dynamic nodes for minimum domain size calculation. When blocks contain dynamic nodes, this may incorrectly determine the minimum static block size since block sizes should apply to static nodes only.

Example: If a block has 0 static and 5 dynamic nodes, minDomainSize would be 5 but should be 0 for the static allocation.

@dmitsh dmitsh self-assigned this Jan 27, 2026
@dmitsh dmitsh requested a review from ravisoundar January 27, 2026 22:04
@NVIDIA NVIDIA deleted a comment from greptile-apps bot Jan 27, 2026
Signed-off-by: Dmitry Shmulevich <dshmulevich@nvidia.com>
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 27, 2026

Greptile Overview

Greptile Summary

This PR adds support for dynamic nodes in Slurm topology configuration, allowing nodes to be marked as dynamically provisioned and excluded from static topology output.

Key Changes:

  • Added DynamicNodes and MinBlocks fields to configuration structs across the slurm engine and translate layer
  • Implemented splitNodes() helper to separate static and dynamic nodes in topology output
  • Dynamic nodes are shown as comments (# dynamic=<nodes>) rather than in the main Nodes= field
  • Added support for generating empty placeholder blocks when MinBlocks exceeds actual block count

Critical Issue:
The fake node calculation in pkg/translate/block.go:108 has a bug that was identified in previous review threads but remains unfixed. When a block contains both static and dynamic nodes, the code incorrectly uses the total node count (static + dynamic) instead of just the static node count when calculating how many fake nodes to add. This will cause blocks with dynamic nodes to have incorrect fake node padding.

Missing Test Coverage:
The test suite lacks coverage for the combination of fake nodes + dynamic nodes, which would have caught the critical bug.

Confidence Score: 2/5

  • Not safe to merge - contains a critical logic bug in fake node calculation that will cause incorrect topology output
  • The implementation is mostly sound, but the bug in pkg/translate/block.go:108 is a critical logic error that will produce incorrect results when fake nodes and dynamic nodes are both used. This issue was identified in previous review comments but has not been addressed. The bug will cause blocks to have incorrect padding, potentially breaking Slurm's block topology requirements.
  • pkg/translate/block.go requires immediate attention to fix the fake node calculation bug on line 108

Important Files Changed

Filename Overview
pkg/translate/block.go Critical bug in fake node calculation when dynamic nodes are present - uses total node count instead of static node count
pkg/translate/block_test.go Adds test coverage for dynamic nodes and MinBlocks, but missing test case for the critical bug scenario (fake nodes + dynamic nodes)
pkg/translate/topology.go Adds DynamicNodes and MinBlocks fields, implements splitNodes helper correctly

Sequence Diagram

sequenceDiagram
    participant User
    participant SlurmEngine
    participant GetTranslateConfig
    participant NetworkTopology
    participant toBlockTopology
    participant splitNodes

    User->>SlurmEngine: GenerateOutput(tree, params)
    Note over SlurmEngine: params include DynamicNodes & MinBlocks
    SlurmEngine->>GetTranslateConfig: GetTranslateConfig(BaseParams)
    GetTranslateConfig->>GetTranslateConfig: Create translate.Config
    Note over GetTranslateConfig: Copy DynamicNodes, MinBlocks to Config
    GetTranslateConfig-->>SlurmEngine: Config with DynamicNodes & MinBlocks
    
    SlurmEngine->>NetworkTopology: NewNetworkTopology(root, cfg)
    NetworkTopology-->>SlurmEngine: NetworkTopology instance
    
    SlurmEngine->>NetworkTopology: Generate(writer)
    NetworkTopology->>toBlockTopology: toBlockTopology(writer)
    
    Note over toBlockTopology: Convert DynamicNodes list to map
    toBlockTopology->>toBlockTopology: nodeList2map(config.DynamicNodes)
    
    loop For each block
        toBlockTopology->>splitNodes: splitNodes(block.nodes, dynamicNodeMap)
        Note over splitNodes: Separate nodes into static & dynamic
        splitNodes-->>toBlockTopology: static string, dynamic string
        
        alt Fake nodes enabled & block too small
            Note over toBlockTopology: BUG: Uses len(bInfo.nodes) - total count!
            toBlockTopology->>toBlockTopology: Calculate fake nodes needed
            Note over toBlockTopology: Should use static node count only
        end
        
        toBlockTopology->>toBlockTopology: Write block line
        Note over toBlockTopology: Nodes=<static> # dynamic=<dynamic>
    end
    
    alt MinBlocks > block count
        loop Generate extra empty blocks
            toBlockTopology->>toBlockTopology: Write extraBlock with Nodes=
        end
    end
    
    toBlockTopology-->>NetworkTopology: Block topology output
    NetworkTopology-->>SlurmEngine: Generated config
    SlurmEngine-->>User: Topology configuration
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile


outputNodeNames := strings.Join(cluset.Compact(bInfo.nodes), ",")
static, dynamic := splitNodes(bInfo.nodes, dynamicNodeMap)
if fnc != nil && len(bInfo.nodes) < fnc.baseBlockSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses total node count len(bInfo.nodes) which includes both static and dynamic nodes. When dynamic nodes are present, this incorrectly calculates the number of fake nodes needed.

The bug: After splitting on line 107, bInfo.nodes still contains all nodes (static + dynamic), but only static nodes are output to the Nodes= field. This causes the fake node calculation to be wrong.

Example scenario:

  • Block has 1 static node + 2 dynamic nodes (total=3)
  • baseBlockSize = 4
  • Current code: adds 4-3=1 fake node → static section will have 2 nodes (wrong!)
  • Expected: should add 4-1=3 fake nodes → static section needs 4 nodes

The fake nodes should pad only the static nodes, not the total. Need to count static nodes after splitting, before calculating fake nodes needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant