Skip to content

Conversation

@tokk-nv
Copy link
Contributor

@tokk-nv tokk-nv commented Aug 24, 2025

Problem

The Ollama test script was failing when Ollama was natively installed on the host system. This caused test failures due to:

  • Port conflicts between the containerized Ollama and the host-installed Ollama
  • Hardcoded port assignments that didn't handle occupied ports

Side Issues

  • Ollama PID detection ( was getting the grep process as well)
  • Missing dynamic CUDA version detection

Solution

This PR improves the Ollama test script with:

  • Dynamic port detection: Automatically finds available ports starting from 11435
  • Port conflict resolution: Handles cases where multiple ports are occupied
  • Dynamic CUDA version detection: Automatically detects CUDA version from nvidia-smi/nvcc
  • Improved error handling: Better process verification and cleanup
  • Robust process management: Fixed PID extraction and verification

Testing

  • ✅ Tested on systems with native Ollama installation
    • Both on Thor (native installer failed, but left the service) and Orin
  • ✅ Verified port conflict resolution works correctly
  • ✅ Confirmed CUDA version detection functions properly
  • ✅ Validated container-safe operation

Impact

  • Fixes test failures on systems with native Ollama
  • Makes the test script more robust and portable
  • Improves CI/CD reliability across different environments

…ersion detection

- Fix process management: use proper PID extraction (column 2 instead of 1)
- Add dynamic port detection: automatically find available ports starting from 11435
- Add dynamic CUDA version detection: automatically detect CUDA version from nvidia-smi/nvcc
- Remove systemctl dependencies: make script container-safe
- Improve error handling: better process verification and cleanup
- Fix CUDA library selection: use detected version instead of undefined variable
- Add robust port conflict resolution: handles cases where multiple ports are occupied
@tokk-nv tokk-nv marked this pull request as ready for review August 24, 2025 07:46
@OriNachum OriNachum requested a review from Copilot August 24, 2025 15:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes critical issues in the Ollama test script that were causing failures when Ollama was already installed on the host system. The changes implement dynamic port detection to avoid conflicts and add automatic CUDA version detection.

  • Implements dynamic port allocation starting from 11435 to avoid conflicts with existing Ollama instances
  • Adds automatic CUDA version detection from nvidia-smi/nvcc instead of relying on hardcoded values
  • Improves process management with better PID detection and cleanup procedures

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +13 to +19
if timeout 1 bash -c "echo >/dev/tcp/127.0.0.1/$port" 2>/dev/null; then
# Port is in use, try next
port=$((port + 1))
else
# Port is available
echo $port
return 0
Copy link

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The port availability check is incorrect. This bash redirection will succeed when the port is in use (connection established), but the logic treats success as 'port in use'. The condition should be inverted - if the connection succeeds, the port is occupied.

Suggested change
if timeout 1 bash -c "echo >/dev/tcp/127.0.0.1/$port" 2>/dev/null; then
# Port is in use, try next
port=$((port + 1))
else
# Port is available
echo $port
return 0
if ! timeout 1 bash -c "echo >/dev/tcp/127.0.0.1/$port" 2>/dev/null; then
# Port is available
echo $port
return 0
else
# Port is in use, try next
port=$((port + 1))

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +50
OLLAMA_PID=$(ps -ef | grep 'ollama serve' | grep -v grep | awk '{ print $2 }')

if [ -n "$OLLAMA_PID" ]; then
echo "Stopping existing ollama process: $OLLAMA_PID"
kill $OLLAMA_PID
Copy link

Copilot AI Aug 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will only capture the first PID if multiple ollama processes are running. Consider using pgrep -f 'ollama serve' for more reliable process detection, or handle multiple PIDs appropriately.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tokk-nv what do you think? Sounds more efficient and clean, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants