Conversation
Greptile OverviewGreptile SummaryMigrates DTK (Driver Toolkit) build from bash to Go entrypoint, consolidating Key Changes:
Issues Found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Main as Sources Container
participant DTK as DTK Container
participant Shared as Shared Volume
Main->>Main: Check if doneFlagPath exists
alt Build not done
Main->>Shared: Create shared directory (dtkSharedDir)
Main->>Shared: Copy driver sources to MLNX_OFED_SRC-{version}
Main->>Shared: Copy /root/entrypoint binary
Main->>Shared: Write dtk.env with env vars
Main->>Shared: Copy dtk_nic_driver_build.sh
Main->>Shared: Create startFlagPath
DTK->>DTK: Install perl and build dependencies
DTK->>Shared: Wait for startFlagPath to exist
Shared-->>DTK: Start flag found
DTK->>DTK: Run install.pl with build flags
DTK->>Shared: Create doneFlagPath
DTK->>Shared: Remove startFlagPath
DTK->>DTK: Sleep indefinitely
Main->>Shared: Poll for doneFlagPath (exponential backoff)
Shared-->>Main: Done flag found
else Build already done
Main->>Main: Log "DTK build already done"
end
Main->>Shared: Find redhat-release-* directory
Main->>Shared: Glob and copy RPMs to inventory
Main->>Main: Complete
|
| func (d *driverMgr) copyDir(ctx context.Context, src, dest string) error { | ||
| // Using cp -r for simplicity and preserving permissions/links | ||
| _, _, err := d.cmd.RunCommand(ctx, "cp", "-r", src, dest) | ||
| return err |
There was a problem hiding this comment.
cp -r src dest creates a directory at dest/basename(src). The actual destination becomes dest/src instead of overwriting dest. Use cp -r src/. dest/ or validate the behavior matches expectations.
entrypoint/internal/dtk/build.go
Outdated
| } | ||
|
|
||
| cmdHelper := cmd.New() | ||
| ctx := context.Background() |
There was a problem hiding this comment.
context.Background() without cancellation means the function can't be interrupted gracefully. If the DTK pod is terminated, this will hang until force-killed.
| if cfg.AppendDriverBuildFlags != "" { | ||
| // Split flags by space | ||
| flags := strings.Fields(cfg.AppendDriverBuildFlags) | ||
| installArgs = append(installArgs, flags...) |
There was a problem hiding this comment.
If APPEND_DRIVER_BUILD_FLAGS contains special shell characters, strings.Fields won't parse them correctly. Bash would handle quoted strings, but this splits only on whitespace.
| } | ||
|
|
||
| // Copy entrypoint binary to shared directory | ||
| entrypointPath := "/root/entrypoint" // Assumed location based on Dockerfile |
There was a problem hiding this comment.
Verify /root/entrypoint exists in the container image before deployment. If missing, setup will fail.
| // Copy entrypoint binary to shared directory | ||
| entrypointPath := "/root/entrypoint" // Assumed location based on Dockerfile | ||
| destEntrypointPath := filepath.Join(sharedDir, "entrypoint") | ||
| log.Info("Copying entrypoint binary", "from", entrypointPath, "to", destEntrypointPath) | ||
| // We use copyFile instead of copyDir for a single file | ||
| if _, _, err := d.cmd.RunCommand(ctx, "cp", entrypointPath, destEntrypointPath); err != nil { | ||
| return fmt.Errorf("failed to copy entrypoint binary: %w", err) | ||
| } |
There was a problem hiding this comment.
Entrypoint binary path mismatch
dtkSetupDriverBuild copies the entrypoint from hardcoded "/root/entrypoint" (driver_dtk.go:110-116), but the loader script executes "$(dirname "$0")/entrypoint" (dtk_nic_driver_build.sh:12). If the actual binary path in the sources container differs from /root/entrypoint (e.g., installed elsewhere in the image), DTK builds will fail at runtime when the DTK container tries to exec the copied binary. This should be derived from the current executable path or passed in explicitly rather than hardcoding.
| // copyDir copies a directory recursively | ||
| func (d *driverMgr) copyDir(ctx context.Context, src, dest string) error { | ||
| // Using cp -rT to treat dest as a normal file (directory) | ||
| // This ensures contents of src are copied into dest, not src into dest/src | ||
| _, _, err := d.cmd.RunCommand(ctx, "cp", "-rT", src, dest) | ||
| return err |
There was a problem hiding this comment.
Non-portable cp -rT usage
copyDir relies on cp -rT (driver_dtk.go:248-252). The -T flag is a GNU coreutils extension and is not available in some base images (e.g., BusyBox cp). If the sources container image doesn’t guarantee GNU cp, this will fail at runtime during DTK setup. Consider switching to a more portable invocation (e.g., cp -a src/. dest/ after ensuring dest exists) or documenting/enforcing the GNU coreutils dependency in the image.
entrypoint/internal/dtk/build.go
Outdated
| if cfg.DtkOcpStartCompileFlag == "" || cfg.DtkOcpDoneCompileFlag == "" { | ||
| err := fmt.Errorf("compilation start/completion flags not set") | ||
| log.Error(err, "aborting") | ||
| return err | ||
| } |
There was a problem hiding this comment.
Missing required DTK env validation
RunBuild only checks DTK_OCP_START_COMPILE_FLAG and DTK_OCP_DONE_COMPILE_FLAG (build.go:37-41) but later unconditionally uses DTK_OCP_NIC_SHARED_DIR and DTK_OCP_COMPILED_DRIVER_VER to build the install.pl path (build.go:81-87). If either is empty, the constructed path becomes incorrect (e.g., MLNX_OFED_SRC-/install.pl) and the build fails with a confusing error. Validate these envs up front and return a clear error.
Additional Comments (1)
|
| srcDirName := fmt.Sprintf("MLNX_OFED_SRC-%s", cfg.DtkOcpCompiledDriverVer) | ||
| installScript := filepath.Join(cfg.DtkOcpNicSharedDir, srcDirName, "install.pl") |
There was a problem hiding this comment.
Missing validation for DTK_OCP_NIC_SHARED_DIR and DTK_OCP_COMPILED_DRIVER_VER before use. If either is empty, the path becomes malformed (e.g., MLNX_OFED_SRC-/install.pl). Add validation in the check at lines 37-41:
| srcDirName := fmt.Sprintf("MLNX_OFED_SRC-%s", cfg.DtkOcpCompiledDriverVer) | |
| installScript := filepath.Join(cfg.DtkOcpNicSharedDir, srcDirName, "install.pl") | |
| if cfg.DtkOcpStartCompileFlag == "" || cfg.DtkOcpDoneCompileFlag == "" || | |
| cfg.DtkOcpNicSharedDir == "" || cfg.DtkOcpCompiledDriverVer == "" { | |
| err := fmt.Errorf("required DTK environment variables not set") |
| cmd := exec.CommandContext(ctx, command, args...) | ||
| // Ensure child process is killed when context is canceled | ||
| cmd.Cancel = func() error { | ||
| return cmd.Process.Signal(syscall.SIGTERM) | ||
| } |
There was a problem hiding this comment.
Nil Process on cancel
cmd.Cancel unconditionally signals cmd.Process, but Process can be nil if the context is canceled before the process starts (e.g., immediate cancellation / failure to start). In that case cmd.Process.Signal(...) will panic. Consider guarding for cmd.Process == nil or relying on CommandContext's default cancellation behavior.
entrypoint/internal/dtk/build.go
Outdated
| if cfg.DtkOcpStartCompileFlag == "" || cfg.DtkOcpDoneCompileFlag == "" { | ||
| err := fmt.Errorf("compilation start/completion flags not set") | ||
| log.Error(err, "aborting") | ||
| return err | ||
| } |
There was a problem hiding this comment.
DTK envs used empty
RunBuild validates only DTK_OCP_START_COMPILE_FLAG and DTK_OCP_DONE_COMPILE_FLAG, but it later uses DTK_OCP_NIC_SHARED_DIR and DTK_OCP_COMPILED_DRIVER_VER to build install.pl path (installScript := filepath.Join(cfg.DtkOcpNicSharedDir, srcDirName, "install.pl")). If either is empty, the path becomes malformed and the build fails with a misleading error. This should validate those envs up front too.
Additional Comments (1)
In |
|
DTK with Go log debug level example: |
This works fine: |
Additional Comments (3)
|
| func (d *driverMgr) buildDriverDTK(ctx context.Context, kernelVersion, inventoryPath string) error { | ||
| log := logr.FromContextOrDiscard(ctx) | ||
| log.Info("Starting DTK driver build") | ||
|
|
||
| // Sanitize kernel version for DTK shared directory | ||
| // Matches bash: DTK_KVER=$(echo "${FULL_KVER}" | sed 's/[^-A-Za-z0-9_.]/_/g' | sed 's/^[-_.]*//;s/[-_.]*$//') | ||
| dtkKver := sanitizeKernelVersion(kernelVersion) | ||
| dtkSharedDir := filepath.Join(d.cfg.DtkOcpNicSharedDir, dtkKver) | ||
|
|
||
| // Construct done flag path | ||
| // Matches bash: DTK_OCP_DONE_COMPILE_FLAG="${DTK_OCP_DONE_COMPILE_FLAG_PREFIX}$(echo ${NVIDIA_NIC_DRIVER_VER} | sed 's/[.-]/_/g')" |
There was a problem hiding this comment.
Stat error treated as success
if _, err := d.os.Stat(doneFlagPath); os.IsNotExist(err) { ... } else { log.Info("DTK build already done") } treats any non-IsNotExist error (e.g. permission denied / I/O error) as “build already done” and proceeds to finalize/copy artifacts. This can skip the build while the flag was unreadable. Handle err != nil separately and return the error unless it’s a clean IsNotExist case.
| cmdMock.EXPECT().RunCommand(ctx, "uname", "-m").Return("x86_64", "", nil) | ||
| osMock.EXPECT().Readlink(mock.Anything).Return("/usr/src/ofa_kernel/x86_64/5.4.0-42-generic", nil) | ||
|
|
||
| // Mock storeBuildChecksum - return valid checksum |
There was a problem hiding this comment.
Stray comments break build
This block introduces non-Go text in the test file:
// We expect it to be called, so if we remove expectation, it should fail with "Unexpected call"
The indentation uses spaces at column 0 (not a tab inside a block) and will produce a compile error (unexpected // / bad indentation) in Go formatting-sensitive contexts. Please remove these debug comments or re-indent them properly within the It(...) body.
Additional Comments (1)
The |
| if _, err := d.os.Stat(doneFlagPath); os.IsNotExist(err) { | ||
| log.Info("DTK build not done, setting up build") | ||
|
|
||
| if err := d.dtkSetupDriverBuild(ctx, dtkSharedDir, startFlagPath, doneFlagPath); err != nil { | ||
| return fmt.Errorf("failed to setup DTK build: %w", err) | ||
| } | ||
|
|
||
| if err := d.dtkWaitForBuild(ctx, doneFlagPath); err != nil { | ||
| return fmt.Errorf("failed waiting for DTK build: %w", err) | ||
| } | ||
| } else { | ||
| log.Info("DTK build already done", "flag", doneFlagPath) |
There was a problem hiding this comment.
non-IsNotExist Stat errors treated as "build done"
Stat errors like permission denied or I/O errors fall through to "DTK build already done" log and skip setup. Should return errors that aren't IsNotExist:
| if _, err := d.os.Stat(doneFlagPath); os.IsNotExist(err) { | |
| log.Info("DTK build not done, setting up build") | |
| if err := d.dtkSetupDriverBuild(ctx, dtkSharedDir, startFlagPath, doneFlagPath); err != nil { | |
| return fmt.Errorf("failed to setup DTK build: %w", err) | |
| } | |
| if err := d.dtkWaitForBuild(ctx, doneFlagPath); err != nil { | |
| return fmt.Errorf("failed waiting for DTK build: %w", err) | |
| } | |
| } else { | |
| log.Info("DTK build already done", "flag", doneFlagPath) | |
| if _, err := d.os.Stat(doneFlagPath); os.IsNotExist(err) { | |
| log.Info("DTK build not done, setting up build") | |
| if err := d.dtkSetupDriverBuild(ctx, dtkSharedDir, startFlagPath, doneFlagPath); err != nil { | |
| return fmt.Errorf("failed to setup DTK build: %w", err) | |
| } | |
| if err := d.dtkWaitForBuild(ctx, doneFlagPath); err != nil { | |
| return fmt.Errorf("failed waiting for DTK build: %w", err) | |
| } | |
| } else if err != nil { | |
| return fmt.Errorf("failed to check done flag: %w", err) | |
| } else { | |
| log.Info("DTK build already done", "flag", doneFlagPath) | |
| } |
| // Mock WriteFile failure - removed to debug if it's called | ||
| // osMock.EXPECT().WriteFile(mock.Anything, mock.Anything, os.FileMode(0o644)).Return(errors.New("write failed")) | ||
| // We expect it to be called, so if we remove expectation, it should fail with "Unexpected call" |
There was a problem hiding this comment.
malformed indentation breaks Go formatting
Lines start at column 0 with spaces instead of being indented within the test block. Remove or properly indent these debug comments:
| // Mock WriteFile failure - removed to debug if it's called | |
| // osMock.EXPECT().WriteFile(mock.Anything, mock.Anything, os.FileMode(0o644)).Return(errors.New("write failed")) | |
| // We expect it to be called, so if we remove expectation, it should fail with "Unexpected call" | |
| // osMock.EXPECT().WriteFile(mock.Anything, mock.Anything, os.FileMode(0o644)).Return(errors.New("write failed")) |
| cmdMock.EXPECT().RunCommand(ctx, "sh", "-c", mock.Anything).Return("", "", nil).Times(4) | ||
|
|
||
| // Mock fixSourceLink | ||
| cmdMock.EXPECT().RunCommand(ctx, "uname", "-m").Return("x86_64", "", nil) |
There was a problem hiding this comment.
Trailing whitespace breaks gofmt
There’s a line that contains only whitespace between the md5sum matcher expectation and the WriteFile expectation. This will fail gofmt/linters in many CI setups. Please delete the whitespace-only line.
Additional Comments (1)
|
| if _, err := d.os.Stat(doneFlagPath); os.IsNotExist(err) { | ||
| log.Info("DTK build not done, setting up build") | ||
|
|
||
| if err := d.dtkSetupDriverBuild(ctx, dtkSharedDir, startFlagPath, doneFlagPath); err != nil { | ||
| return fmt.Errorf("failed to setup DTK build: %w", err) | ||
| } | ||
|
|
||
| if err := d.dtkWaitForBuild(ctx, doneFlagPath); err != nil { | ||
| return fmt.Errorf("failed waiting for DTK build: %w", err) | ||
| } | ||
| } else { | ||
| log.Info("DTK build already done", "flag", doneFlagPath) | ||
| } |
There was a problem hiding this comment.
Stat errors skip build
buildDriverDTK treats any Stat error other than IsNotExist as “build already done” (else { ... }), so permission/I/O errors on the done-flag file will incorrectly skip the build and proceed to finalize/copy artifacts. Handle err != nil separately and return it unless os.IsNotExist(err).
| // Create a context that we can cancel to simulate end of execution | ||
| ctx, cancel := context.WithCancel(context.Background()) | ||
|
|
||
| // Run in a goroutine so we can cancel it | ||
| errCh := make(chan error) | ||
| go func() { | ||
| errCh <- RunBuild(ctx, log, cfg, cmdMock) | ||
| }() |
There was a problem hiding this comment.
Gofmt-breaking whitespace
There are tabs/spaces on otherwise blank lines (e.g. right after context.WithCancel(...) and after cancel()), which will fail gofmt/formatting checks in many Go CI setups. Please remove the trailing whitespace on those blank lines.
| # Load environment variables if file exists | ||
| if [ -f "$(dirname "$0")/dtk.env" ]; then | ||
| source "$(dirname "$0")/dtk.env" | ||
| fi | ||
|
|
||
| : ${USE_NEW_ENTRYPOINT:=false} | ||
|
|
||
| if [ "$USE_NEW_ENTRYPOINT" = "true" ]; then | ||
| echo "Using Go entrypoint for DTK build" | ||
| exec "$(dirname "$0")/entrypoint" dtk-build | ||
| fi |
There was a problem hiding this comment.
Sourced env overwritten
After sourcing dtk.env (lines 4-7), the script later resets DTK_OCP_START_COMPILE_FLAG, DTK_OCP_DONE_COMPILE_FLAG, and DTK_OCP_COMPILED_DRIVER_VER to empty strings (lines 23-25). This overwrites the values written by dtkSetupDriverBuild, so the bash build path will always hit the “Compilation start/completion flags not set, aborting” check.
| # Load environment variables if file exists | |
| if [ -f "$(dirname "$0")/dtk.env" ]; then | |
| source "$(dirname "$0")/dtk.env" | |
| fi | |
| : ${USE_NEW_ENTRYPOINT:=false} | |
| if [ "$USE_NEW_ENTRYPOINT" = "true" ]; then | |
| echo "Using Go entrypoint for DTK build" | |
| exec "$(dirname "$0")/entrypoint" dtk-build | |
| fi | |
| # Only default these when not already set by the environment/dtk.env | |
| : "${DTK_OCP_START_COMPILE_FLAG:=}" | |
| : "${DTK_OCP_DONE_COMPILE_FLAG:=}" | |
| : "${DTK_OCP_COMPILED_DRIVER_VER:=}" |
| // Create start flag after a short delay | ||
| go func() { | ||
| time.Sleep(100 * time.Millisecond) | ||
| f, err := os.Create(startFlag) | ||
| assert.NoError(t, err) | ||
| f.Close() | ||
| }() |
There was a problem hiding this comment.
Assertion in goroutine
This goroutine calls assert.NoError(t, err) using the parent *testing.T. Using t concurrently from another goroutine can lead to races/flaky tests. A safer pattern is to avoid assertions inside the goroutine (e.g., send err on a channel and assert in the main test goroutine) and ensure the file is closed only when err == nil.
| FROM golang:1.24 AS go_builder | ||
|
|
||
| # Set GOPROXY if provided | ||
| ARG GOPROXY | ||
| ENV GOPROXY=$GOPROXY | ||
|
|
||
| WORKDIR /workspace | ||
|
|
||
| COPY entrypoint/go.mod go.mod | ||
| COPY entrypoint/go.sum go.sum | ||
|
|
||
| RUN go mod download | ||
|
|
||
| COPY entrypoint/ . | ||
|
|
||
| RUN TARGETARCH=${D_ARCH} TARGETOS=linux make build |
There was a problem hiding this comment.
Missing ARG in stage
The go_builder stage uses RUN TARGETARCH=${D_ARCH} TARGETOS=linux make build, but D_ARCH is not declared in that stage (only before the first FROM). In Docker, ARGs must be re-declared in each stage to be available to RUN, so ${D_ARCH} can expand empty and build the entrypoint for the wrong arch.
| FROM golang:1.24 AS go_builder | |
| # Set GOPROXY if provided | |
| ARG GOPROXY | |
| ENV GOPROXY=$GOPROXY | |
| WORKDIR /workspace | |
| COPY entrypoint/go.mod go.mod | |
| COPY entrypoint/go.sum go.sum | |
| RUN go mod download | |
| COPY entrypoint/ . | |
| RUN TARGETARCH=${D_ARCH} TARGETOS=linux make build | |
| FROM golang:1.24 AS go_builder | |
| ARG D_ARCH |
Signed-off-by: Fred Rolland <frolland@nvidia.com>
|
|
||
| COPY entrypoint/ . | ||
|
|
||
| RUN TARGETARCH=${D_ARCH} TARGETOS=linux make build |
There was a problem hiding this comment.
D_ARCH not declared in this stage - expands to empty string, building for wrong architecture
| RUN TARGETARCH=${D_ARCH} TARGETOS=linux make build | |
| ARG D_ARCH | |
| RUN TARGETARCH=${D_ARCH} TARGETOS=linux make build |
No description provided.