Skip to content

Commit c89fb32

Browse files
authored
GitHub action workflow (#9)
* Add GitHub Actions workflow for automated notebook execution - Add preprocess_notebook.py script to create notebook copies and replace input() calls with None for papermill parameter injection - Add notebook-execution.yml workflow that: - Executes ambient-patient.ipynb and ambient-provider.ipynb in parallel using matrix strategy - Preprocesses notebooks to replace interactive inputs with parameter placeholders - Executes notebooks using papermill with API key injection - Converts executed notebooks to HTML format - Uploads HTML and executed notebooks as artifacts - Generates summary report with execution status and HTML paths - All source notebooks remain unchanged, only copies are modified and executed - Workflow triggers on push, pull_request, and manual workflow_dispatch * Update workflow to trigger on all branches instead of only main/master * Modify notebooks for automated execution and update workflow - Remove preprocess_notebook.py script as notebooks are now directly modified - Update workflow to execute notebooks directly without preprocessing - Modify ambient-patient.ipynb: - Change NGC_API_KEY to read from environment variable instead of input() - Add service readiness check cell before API test to ensure app-server is ready - Modify ambient-provider.ipynb: - Change NGC_API_KEY to read from environment variable instead of hardcoded value - All modifications are minimal and do not affect business logic - Workflow now passes API keys via environment variables * Fix workflow: remove --allow-errors and ensure failures terminate execution - Remove --allow-errors flag as it's not supported by papermill - Remove continue-on-error and || true to allow failures to propagate - Add checks to ensure executed notebook and HTML files exist - Fail workflow if notebook execution or HTML conversion fails - Remove if-no-files-found: ignore from artifact uploads to ensure failures are caught * Add conditional execution for Option 1 and Option 2 in ambient-provider notebook - Add DEPLOYMENT_OPTION environment variable support to control which deployment option to execute - Option 1 (default): RIVA Integrated with Docker Compose - Option 2: Standalone NIM Deployment - Fix Cell 42 syntax error by converting shell commands to use os.system() - All shell commands now use os.system() to work properly in conditional statements - Add workflow_dispatch input parameter to allow manual selection of deployment option - Default behavior executes Option 1 if DEPLOYMENT_OPTION is not set * temp * Comment out Option 2 deployment commands and ngrok setup in ambient-provider notebook for automated execution * Fix Cell 48 syntax error: use ! prefix for shell command execution * Simplify .env file update using shell commands and add log filtering for duplicate output * Remove log filtering and grouping, simplify git clone command with shell, fix directory change using magic command * Fix NGC_API_KEY environment variable access in docker login command * Fix docker login username by using single quotes for literal * Add service health check using shell script after services start * Fix shell compatibility and simplify service health check logic * Increase service health check wait time and retry count for service startup * Improve service health check with Docker status and better error diagnostics * Add .gitignore file to exclude notebook_runner.py and output directory * Update workflow to use notebook_runner.py script - Replace papermill and nbconvert steps with notebook_runner.py - Download notebook_runner.py from Blueprint-Utils repository - Use --output-dir parameter instead of -o - Simplify execution and HTML conversion into single step * Add Jupyter kernel installation step to workflow - Install ipykernel package - Register python3 kernel for papermill execution - Fixes 'No such kernel named python3' error * Remove duplicate health check and unify service health verification * Improve health check with Docker status monitoring and extended timeout * Extend health check timeout to 15 minutes and add docker ps -a in each check iteration * Add proactive container log monitoring and exit detection with detailed logs * Add comprehensive GPU resource monitoring and model download status checks * Refactor health check code with helper functions and optimize to reduce code by 63% * Update workflow to use notebook_runner_nbclient.py and remove irrelevant logs from health check - Switch to notebook_runner_nbclient.py for more reliable notebook execution - Remove periodic docker logs output that shows irrelevant 404 errors - Optimize docker ps -a output frequency in health check - Ensure all code output is in English * Fix notebook cell format: ensure proper newlines between code elements - Add newline after each source element in health check cell (Cell 49) - Fix SyntaxError caused by code concatenation (import osexit_code issue) - Ensure nbclient can properly execute the notebook by maintaining correct line breaks * Add notebook format validation to CI/CD pipeline - Create validate_notebook_format.py script to check notebook source format - Add validation step in GitHub Actions workflow before notebook execution - Prevent format issues that cause SyntaxError (e.g., 'import osexit_code') - Ensure all source array elements have proper newlines according to Jupyter spec * Fix shell command chain: remove trailing backslash after check_gpu - Remove trailing backslash (\\) after 'check_gpu &&' command - Fix 'sh: 38: : not found' error caused by empty command after backslash - Command chain now properly terminates at check_gpu without continuation * Fix shell script backslash issue and enhance notebook validation - Remove unnecessary backslash from line 42 in health check cell - Fix 'sh: 38: : not found' error caused by backslash followed by empty line - Enhance validate_notebook_format.py to detect shell script logic errors - Add detection for backslash continuation followed by empty line - Use regex to properly detect backslash characters in validation * Fix GPU status check frequency and add model download progress display - Convert command chain to if statement for progress check (fixes excessive GPU status output) - Add check_model_download_progress() function to show download progress - Display downloaded size, file list, and progress percentage for both Parakeet and Llama models - Show progress updates every 12 iterations during health check - Calculate progress based on expected model sizes (~3GB for Parakeet, ~25GB for Llama) * fix: correct shell function definition scope in health check - Move check_model_download_progress() out of check_exited_containers() - Functions are now properly defined as independent top-level functions - Fixes 'sh: check_model_download_progress: not found' error Root cause: check_model_download_progress() was incorrectly nested inside check_exited_containers(), making it unavailable for calling in the script. Verification: - Notebook format validation: PASS - Shell syntax check (bash -n): PASS - Function structure validation: PASS - Function call test: PASS * feat: add unhealthy container detection with comprehensive diagnostics Problem: - Health check only monitored 'Exited' and 'starting' states - 'unhealthy' containers were silently ignored - When Parakeet became unhealthy, code mistakenly passed health check - Wasted 20+ minutes before discovering the issue Solution: - Add immediate unhealthy container detection in health check loop - Enhanced break condition to check both starting==0 AND unhealthy==0 - Fast-fail on unhealthy status with detailed diagnostics Diagnostic Information Added: 1. Container status overview 2. Last 100 lines of container logs 3. Docker health check config and failure logs 4. GPU memory status (total/used/free) 5. Complete nvidia-smi output 6. GPU process monitoring (nvidia-smi pmon) Benefits: - Detect unhealthy containers immediately (~1 min vs ~21 min) - Save ~20 minutes of CI/CD time on failures - Provide comprehensive debug information for root cause analysis - Better visibility into container startup failures Verification: - Notebook format validation: PASS - Shell syntax check: PASS - Logic validation: PASS * trigger action runner * Update workflow to use notebook_runner from private NVIDIA-AI-Blueprints repo - Switch download URL to NVIDIA-AI-Blueprints/blueprint-github-test repo - Add Authorization header with BLUEPRINT_GITHUB_TEST_TOKEN_ON_GH secret - Add --skip-deps-check flag to skip redundant dependency checks - Use environment variable references for safer secret handling * update runner names * Add nbclient nbformat jupyter nbconvert dependencies to workflow - Install required packages for notebook_runner_nbclient.py execution - Rename step to 'Install dependencies and register Jupyter kernel' * Add compose override cell to fix NIM health check endpoints - Add new cell before 'make dev-nim' to create compose.override.yml - Fix parakeet-nim healthcheck: /v1/health -> /v1/health/ready - Fix llama-nim healthcheck: /v1/health -> /v1/health/ready - Docker compose automatically merges override files * Move service verification from notebook to workflow - Add --skip-cells 50-59 to skip verification cell and subsequent cells - Add 'Fix NIM health check endpoints' step with compose.override.yml - Add 'Verify services health' step with HTTP health checks for: - Parakeet NIM (localhost:9000/v1/health/ready) - Llama NIM (localhost:8001/v1/health/ready) - API (localhost:8000/api/health) - API Docs (localhost:8000/api/docs) - UI (localhost:5173) * try to restore notebook * Reset ambient-provider.ipynb to match main branch - Revert notebook to original state from main branch - All CI/CD customizations are now in workflow file only * Skip cell 7 (NGC_API_KEY placeholder) in notebook execution - Add cell 7 to skip-cells list to prevent overwriting NGC_API_KEY - NGC_API_KEY is properly set via workflow env and -e flag - Cell 13 (docker login) will use the correct API key * Skip Option 2 Standalone NIM Deployment cells (41-46) - Skip cells 41-46: Option 2 standalone NIM deployment section - These cells are for manual deployment on separate machine - Not needed for CI/CD workflow using docker-compose Current skip-cells: 7, 41-46, 50-59 * Fix API env configuration and notebook syntax Workflow: - Add 'Pre-configure API environment file' step - Create apps/api/.env with NVIDIA_API_KEY before make bootstrap runs - Prevents 'NVIDIA_API_KEY is still a placeholder' error Notebook: - Fix cell 48: add missing '!' prefix for shell command * Fix file rename error when source and target are the same - Check if source and target paths are different before mv - Prevents 'are the same file' error when names match * Improve service health verification with better error handling - Add continue-on-error: true to prevent workflow failure - Check container status before URL health check - Capture container logs (last 50 lines) when container exits/crashes - Separate NIM failures (non-critical) from API/UI (critical) - Show container logs for debugging crashed containers
1 parent f9bc363 commit c89fb32

4 files changed

Lines changed: 751 additions & 1 deletion

File tree

Lines changed: 283 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,283 @@
1+
#!/usr/bin/env python3
2+
"""
3+
Notebook Format Validator
4+
5+
This script validates that Jupyter Notebook files have correct format,
6+
specifically ensuring that source array elements have proper newlines.
7+
8+
According to Jupyter Notebook specification:
9+
- When source is an array of strings, elements are joined WITHOUT separators
10+
- Each element MUST end with '\n' if a newline is needed between elements
11+
- This prevents code concatenation issues (e.g., 'import osexit_code')
12+
13+
Usage:
14+
python3 validate_notebook_format.py notebook.ipynb
15+
python3 validate_notebook_format.py *.ipynb
16+
"""
17+
18+
import argparse
19+
import json
20+
import sys
21+
from pathlib import Path
22+
from typing import List, Tuple
23+
24+
25+
def validate_notebook_source_format(notebook_path: Path) -> Tuple[bool, List[str]]:
26+
"""
27+
Validate notebook source format.
28+
29+
Returns:
30+
(is_valid, error_messages)
31+
"""
32+
errors = []
33+
34+
try:
35+
with open(notebook_path, 'r', encoding='utf-8') as f:
36+
nb = json.load(f)
37+
except json.JSONDecodeError as e:
38+
return False, [f"Invalid JSON: {e}"]
39+
except Exception as e:
40+
return False, [f"Failed to read notebook: {e}"]
41+
42+
cells = nb.get('cells', [])
43+
if not cells:
44+
return True, [] # Empty notebook is valid
45+
46+
for cell_idx, cell in enumerate(cells):
47+
if cell.get('cell_type') != 'code':
48+
continue # Only validate code cells
49+
50+
source = cell.get('source', [])
51+
if not source:
52+
continue # Empty source is valid
53+
54+
# Check if source is an array
55+
if isinstance(source, list):
56+
# Validate each element (except the last one)
57+
for elem_idx in range(len(source) - 1):
58+
elem = source[elem_idx]
59+
if isinstance(elem, str):
60+
# Check if element should end with newline
61+
# Rule: If next element exists and is not empty, current should end with \n
62+
next_elem = source[elem_idx + 1]
63+
if isinstance(next_elem, str) and next_elem.strip():
64+
# Check if current element ends with newline
65+
if not elem.endswith('\n'):
66+
# Check if this would cause concatenation issues
67+
# Common problematic patterns:
68+
problematic_patterns = [
69+
('import os', 'exit_code'),
70+
('import os', 'exit_code'),
71+
('}', 'check_'),
72+
('}', 'echo'),
73+
('}', 'for'),
74+
('}', 'done'),
75+
('done', 'echo'),
76+
('done', 'for'),
77+
('done', '#'),
78+
]
79+
80+
# Check if concatenation would create problematic patterns
81+
concatenated = elem + next_elem
82+
has_problem = False
83+
for pattern_start, pattern_end in problematic_patterns:
84+
if pattern_start in elem.rstrip() and pattern_end in next_elem.lstrip():
85+
# Check if they would be concatenated
86+
if elem.rstrip().endswith(pattern_start) and next_elem.lstrip().startswith(pattern_end):
87+
has_problem = True
88+
break
89+
90+
if has_problem or (elem.strip() and next_elem.strip()):
91+
errors.append(
92+
f"Cell {cell_idx}: Element {elem_idx} missing newline. "
93+
f"Content: {repr(elem[:50])}... → {repr(next_elem[:30])}..."
94+
)
95+
96+
# Additional check: Shell script logic errors
97+
# Check for backslash continuation followed by empty line
98+
if isinstance(source, list):
99+
for elem_idx in range(len(source) - 1):
100+
elem = source[elem_idx]
101+
next_elem = source[elem_idx + 1]
102+
103+
if isinstance(elem, str) and isinstance(next_elem, str):
104+
# Check if line ends with backslash (continuation character)
105+
# In Python strings, '\\' represents a single backslash character
106+
elem_stripped = elem.rstrip()
107+
# Check if ends with backslash (in Python string literal: '\\' = single \)
108+
# Use regex to properly detect backslash at end
109+
import re
110+
ends_with_backslash = bool(re.search(r'\\$', elem_stripped))
111+
ends_with_and_backslash = bool(re.search(r' && \\$', elem_stripped))
112+
113+
if ends_with_backslash or ends_with_and_backslash:
114+
# Check if next line is empty or only whitespace
115+
if next_elem.strip() == '':
116+
errors.append(
117+
f"Cell {cell_idx}: Element {elem_idx} has backslash continuation "
118+
f"followed by empty line. This will cause 'sh: X: : not found' error. "
119+
f"Content: {repr(elem[:60])}..."
120+
)
121+
# Check if next line doesn't continue the command (doesn't start with space/tab)
122+
elif not next_elem.startswith(' ') and not next_elem.startswith('\t') and next_elem.strip():
123+
# Next line is not a continuation, but current line has backslash
124+
# This might be intentional, but could be an error
125+
# Only warn if it's clearly a problem (next line starts a new command)
126+
if next_elem.strip().startswith(('echo', 'if', 'for', 'while', 'done', 'fi', 'then', 'else')):
127+
errors.append(
128+
f"Cell {cell_idx}: Element {elem_idx} has backslash continuation "
129+
f"but next line starts a new command. Remove the backslash or add continuation. "
130+
f"Content: {repr(elem[:60])}... → {repr(next_elem[:30])}..."
131+
)
132+
133+
elif isinstance(source, str):
134+
# Single string format: check for backslash + empty line pattern
135+
import re
136+
lines = source.split('\n')
137+
for i in range(len(lines) - 1):
138+
line = lines[i]
139+
next_line = lines[i + 1]
140+
line_stripped = line.rstrip()
141+
ends_with_backslash = bool(re.search(r'\\$', line_stripped))
142+
ends_with_and_backslash = bool(re.search(r' && \\$', line_stripped))
143+
if ends_with_backslash or ends_with_and_backslash:
144+
if next_line.strip() == '':
145+
errors.append(
146+
f"Cell {cell_idx}: Line {i+1} has backslash continuation "
147+
f"followed by empty line. This will cause shell script errors."
148+
)
149+
150+
return len(errors) == 0, errors
151+
152+
153+
def validate_notebook_structure(notebook_path: Path) -> Tuple[bool, List[str]]:
154+
"""
155+
Validate basic notebook structure.
156+
157+
Returns:
158+
(is_valid, error_messages)
159+
"""
160+
errors = []
161+
162+
try:
163+
with open(notebook_path, 'r', encoding='utf-8') as f:
164+
nb = json.load(f)
165+
except json.JSONDecodeError as e:
166+
return False, [f"Invalid JSON: {e}"]
167+
except Exception as e:
168+
return False, [f"Failed to read notebook: {e}"]
169+
170+
# Check required fields
171+
if 'cells' not in nb:
172+
errors.append("Missing 'cells' field")
173+
174+
if 'cells' in nb and not isinstance(nb['cells'], list):
175+
errors.append("'cells' must be a list")
176+
177+
return len(errors) == 0, errors
178+
179+
180+
def main():
181+
parser = argparse.ArgumentParser(
182+
description='Validate Jupyter Notebook format',
183+
formatter_class=argparse.RawDescriptionHelpFormatter,
184+
epilog="""
185+
Examples:
186+
# Validate a single notebook
187+
%(prog)s notebook.ipynb
188+
189+
# Validate multiple notebooks
190+
%(prog)s *.ipynb
191+
192+
# Validate with verbose output
193+
%(prog)s -v notebook.ipynb
194+
"""
195+
)
196+
197+
parser.add_argument(
198+
'notebooks',
199+
nargs='+',
200+
help='Notebook file(s) to validate'
201+
)
202+
203+
parser.add_argument(
204+
'-v', '--verbose',
205+
action='store_true',
206+
help='Show detailed validation information'
207+
)
208+
209+
parser.add_argument(
210+
'--skip-structure-check',
211+
action='store_true',
212+
help='Skip basic structure validation (only check source format)'
213+
)
214+
215+
args = parser.parse_args()
216+
217+
# Collect all notebook files
218+
notebook_files = []
219+
for pattern in args.notebooks:
220+
path = Path(pattern)
221+
if path.exists() and path.is_file():
222+
notebook_files.append(path)
223+
elif '*' in pattern or '?' in pattern:
224+
# Handle glob patterns
225+
import glob
226+
notebook_files.extend([Path(f) for f in glob.glob(pattern) if Path(f).suffix == '.ipynb'])
227+
else:
228+
print(f"Warning: File not found: {pattern}", file=sys.stderr)
229+
230+
if not notebook_files:
231+
print("Error: No notebook files found", file=sys.stderr)
232+
sys.exit(1)
233+
234+
# Validate each notebook
235+
all_valid = True
236+
total_errors = 0
237+
238+
for notebook_path in notebook_files:
239+
if args.verbose:
240+
print(f"\nValidating: {notebook_path}")
241+
print("=" * 60)
242+
243+
# Structure validation
244+
if not args.skip_structure_check:
245+
struct_valid, struct_errors = validate_notebook_structure(notebook_path)
246+
if not struct_valid:
247+
all_valid = False
248+
total_errors += len(struct_errors)
249+
print(f"\n{notebook_path}: Structure validation failed")
250+
for error in struct_errors:
251+
print(f" - {error}")
252+
continue
253+
254+
# Source format validation
255+
format_valid, format_errors = validate_notebook_source_format(notebook_path)
256+
257+
if format_valid:
258+
if args.verbose:
259+
print(f"✓ {notebook_path}: Format is valid")
260+
else:
261+
all_valid = False
262+
total_errors += len(format_errors)
263+
print(f"\n{notebook_path}: Format validation failed ({len(format_errors)} error(s))")
264+
for error in format_errors:
265+
print(f" - {error}")
266+
267+
# Summary
268+
print("\n" + "=" * 60)
269+
if all_valid:
270+
print(f"✓ All {len(notebook_files)} notebook(s) passed validation")
271+
sys.exit(0)
272+
else:
273+
print(f"✗ Validation failed: {total_errors} error(s) found in {len(notebook_files)} notebook(s)")
274+
print("\nFix suggestions:")
275+
print(" 1. Ensure each source array element ends with '\\n' if a newline is needed")
276+
print(" 2. Use a notebook editor that properly formats source arrays")
277+
print(" 3. Run: python3 -c \"import json; nb=json.load(open('notebook.ipynb')); ...\" to fix format")
278+
sys.exit(1)
279+
280+
281+
if __name__ == '__main__':
282+
main()
283+

0 commit comments

Comments
 (0)