-
Notifications
You must be signed in to change notification settings - Fork 174
Add kernrl: GPU kernel optimization environment #308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
kernrl is an RL environment for training LLM agents to write optimized CUDA/Triton GPU kernels. Agents receive a PyTorch reference implementation and must write a kernel that produces the same output faster. Features: - 89 problems across 10 difficulty levels - Comprehensive profiling (NSight Systems/Compute, torch.profiler) - Correctness verification with configurable tolerances - Benchmark timing for speedup measurement - Rich feedback for iterative optimization Problem levels: - L1: Simple operators (matmul, softmax, conv, norms) - L2: Fused operations (matmul+activation chains) - L3: Single blocks (attention, transformer block) - L4: Novel layers (MLA, MoE, GQA, FP8, INT4) - L5-L10: Scientific computing, graphics, signal processing, video processing, parallel primitives, cryptography Requires: NVIDIA GPU with CUDA toolkit, PyTorch, Triton Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Hi @Infatoshi! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
|
Hi @Infatoshi Thanks for the PR. From a high level, here are some points we should work on first:
Once again, cool project! |
|
Thanks @burtenshaw! All addressed:
Ready for review! |
|
Thanks @Infatoshi . This looks great. One thing concerns me though. The environment does not conform to the openenv structure created by the CLI. And within this repo, we want all the envs to conform to this structure because they should serve as 'examples'. Two options: Could you restructure the env following this guide and using This will add package management, build and push support, and make it easier to build on top of. |
- Add try/except import pattern for in-repo vs standalone compatibility - Rename kernel_env.py to kernrl_environment.py (follow naming convention) - Update openenv.yaml to match spec_version 1 format - Update pyproject.toml with openenv-core git dependency - Update Dockerfile to use openenv-base multi-stage build pattern - Update __init__.py exports for both package and server This makes kernrl compatible with `openenv init` structure and other environments like echo_env, enabling CLI support for build/push/serve. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Restructured kernrl to match the openenv CLI structure! Changes in commit 5f24f37: Structure changes:
The environment now follows the same structure as Let me know if anything else needs adjustment! |
|
@greptile |
Greptile SummaryAdds Key Features
Issues FoundCritical - Reward Inconsistency: Three different reward calculations exist:
The actual reward comes from Moderate - Configuration Mismatch: Security Concerns: User-submitted kernel code executes with full Python privileges via dynamic imports. While Docker provides isolation, there's no sandboxing within the container for:
This is acceptable for trusted research environments but should be documented as a security consideration. ArchitectureThe environment follows OpenEnv patterns correctly:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Agent as LLM Agent
participant Client as kernrl_env (Client)
participant Server as FastAPI Server
participant Env as KernelOptEnvironment
participant Eval as LocalGPUEvaluator
participant GPU as GPU Hardware
Agent->>Client: reset(problem_id)
Client->>Server: WebSocket: reset
Server->>Env: reset(problem_id)
Env->>Env: Load problem from problems/
Env-->>Server: KernelObservation (description, reference_code)
Server-->>Client: JSON response
Client-->>Agent: KernelObservation
loop Until solved or max_turns
Agent->>Agent: Generate optimized kernel code
Agent->>Client: step(KernelAction(code))
Client->>Server: WebSocket: step
Server->>Env: step(action)
Env->>Eval: evaluate(solution_code, reference_code)
Eval->>Eval: Write code to tmpdir
Eval->>GPU: Compile solution (subprocess)
GPU-->>Eval: Compilation result
alt Compilation succeeds
Eval->>GPU: Run Compute Sanitizer
GPU-->>Eval: Memory/race errors
Eval->>GPU: Execute correctness check
GPU-->>Eval: Output diff statistics
alt Correctness passes
Eval->>GPU: Run benchmark (warmup + timing)
GPU-->>Eval: Speedup measurement
Eval->>GPU: NSight Systems profiling
GPU-->>Eval: System-level metrics
Eval->>GPU: NSight Compute profiling
GPU-->>Eval: Kernel-level metrics
end
end
Eval->>Eval: _compute_reward(result)
Eval-->>Env: EvalResult (compilation, correctness, benchmark, profiling)
Env->>Env: Update state (best_speedup, solved)
Env-->>Server: KernelObservation (feedback, reward, done)
Server-->>Client: JSON response
Client-->>Agent: StepResult[KernelObservation]
Agent->>Agent: Analyze feedback, adjust kernel
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
103 files reviewed, 5 comments
envs/kernrl/README.md
Outdated
| ## Reward Structure | ||
|
|
||
| | Component | Reward | Description | | ||
| |-----------|--------|-------------| | ||
| | Compilation | +0.1 | Code compiles successfully | | ||
| | Correctness | +0.3 | Output matches reference | | ||
| | Beats baseline | +0.3 | Speedup > 1.0x | | ||
| | Speedup bonus | +0.3 | Scales with log2(speedup) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Reward structure doesn't match implementation.
README says:
- Compilation: +0.1
- Correctness: +0.3
- Beats baseline: +0.3
- Speedup bonus: +0.3 (log2 scaled)
But evaluator.py:690-715 implements:
- Compilation: +0.1
- Correctness: +0.3
- Speedup > 1.0: +0.3 + bonus (log2 scaled, capped at 0.3)
And kernrl_environment.py:324-340 implements a THIRD version:
- Compilation fail: -0.5
- Incorrect: -0.25
- Speedup > 1.0:
min(speedup - 1.0, 2.0) - Speedup < 1.0:
(speedup - 1.0) * 0.5
Need to unify the reward logic and update docs accordingly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/kernrl/README.md
Line: 104:111
Comment:
**logic:** Reward structure doesn't match implementation.
README says:
- Compilation: +0.1
- Correctness: +0.3
- Beats baseline: +0.3
- Speedup bonus: +0.3 (log2 scaled)
But `evaluator.py:690-715` implements:
- Compilation: +0.1
- Correctness: +0.3
- Speedup > 1.0: +0.3 + bonus (log2 scaled, capped at 0.3)
And `kernrl_environment.py:324-340` implements a THIRD version:
- Compilation fail: -0.5
- Incorrect: -0.25
- Speedup > 1.0: `min(speedup - 1.0, 2.0)`
- Speedup < 1.0: `(speedup - 1.0) * 0.5`
Need to unify the reward logic and update docs accordingly.
How can I resolve this? If you propose a fix, please make it concise.| def _calculate_reward(self, eval_result) -> float: | ||
| """Calculate reward based on evaluation results.""" | ||
| if not eval_result.compilation.success: | ||
| return -0.5 # Penalty for compilation failure | ||
|
|
||
| if eval_result.correctness and not eval_result.correctness.correct: | ||
| return -0.25 # Penalty for incorrect output | ||
|
|
||
| if eval_result.benchmark and eval_result.benchmark.speedup: | ||
| # Reward proportional to speedup | ||
| speedup = eval_result.benchmark.speedup | ||
| if speedup > 1.0: | ||
| return min(speedup - 1.0, 2.0) # Cap reward at 2.0 | ||
| else: | ||
| return (speedup - 1.0) * 0.5 # Smaller penalty for being slower | ||
|
|
||
| return 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Duplicate reward calculation. The _calculate_reward method here is overridden by evaluator._compute_reward() at line 313 of evaluator.py, making this implementation dead code. Remove this method or clarify which reward function should be used.
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/kernrl/server/kernrl_environment.py
Line: 324:340
Comment:
**logic:** Duplicate reward calculation. The `_calculate_reward` method here is overridden by `evaluator._compute_reward()` at line 313 of `evaluator.py`, making this implementation dead code. Remove this method or clarify which reward function should be used.
How can I resolve this? If you propose a fix, please make it concise.| # Step info | ||
| step: int = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Enable flags default to True but the KernelOptEnvironment constructor at kernrl_environment.py:79-80 defaults enable_nsys=True, enable_ncu=False. The evaluator then enables NCU by default at line 212, creating inconsistency.
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/kernrl/server/evaluator.py
Line: 79:80
Comment:
**logic:** Enable flags default to `True` but the `KernelOptEnvironment` constructor at `kernrl_environment.py:79-80` defaults `enable_nsys=True, enable_ncu=False`. The evaluator then enables NCU by default at line 212, creating inconsistency.
How can I resolve this? If you propose a fix, please make it concise.| runner_path.write_text(f''' | ||
| import torch | ||
| import importlib.util | ||
|
|
||
| def load_module(path, name): | ||
| spec = importlib.util.spec_from_file_location(name, path) | ||
| mod = importlib.util.module_from_spec(spec) | ||
| spec.loader.exec_module(mod) | ||
| return mod | ||
|
|
||
| ref_mod = load_module("{reference_path}", "reference") | ||
| sol_mod = load_module("{solution_path}", "solution") | ||
|
|
||
| device = "{self.device}" | ||
|
|
||
| if hasattr(ref_mod, "get_init_inputs"): | ||
| init_inputs = ref_mod.get_init_inputs() | ||
| else: | ||
| init_inputs = [] | ||
|
|
||
| model = sol_mod.Model(*init_inputs).to(device).eval() | ||
|
|
||
| if hasattr(ref_mod, "get_inputs"): | ||
| inputs = [x.to(device) if isinstance(x, torch.Tensor) else x for x in ref_mod.get_inputs()] | ||
| else: | ||
| inputs = [torch.randn(16, 1024, device=device)] | ||
|
|
||
| # Warmup | ||
| with torch.no_grad(): | ||
| for _ in range(5): | ||
| model(*inputs) | ||
|
|
||
| torch.cuda.synchronize() | ||
|
|
||
| # Profile this | ||
| with torch.no_grad(): | ||
| for _ in range(10): | ||
| model(*inputs) | ||
|
|
||
| torch.cuda.synchronize() | ||
| ''') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Arbitrary code execution without sandboxing. The runner script dynamically imports and executes user-submitted kernel code via spec.loader.exec_module(mod). While Docker provides some isolation, the code runs with full Python privileges and can:
- Access filesystem
- Make network requests
- Execute system commands via
subprocess - Import any available module
Consider:
- Adding resource limits (CPU, memory, time)
- Restricting filesystem access to temp directories only
- Network isolation
- Disabling dangerous imports (
subprocess,os.system, etc.)
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/kernrl/server/evaluator.py
Line: 325:365
Comment:
**logic:** Arbitrary code execution without sandboxing. The runner script dynamically imports and executes user-submitted kernel code via `spec.loader.exec_module(mod)`. While Docker provides some isolation, the code runs with full Python privileges and can:
- Access filesystem
- Make network requests
- Execute system commands via `subprocess`
- Import any available module
Consider:
- Adding resource limits (CPU, memory, time)
- Restricting filesystem access to temp directories only
- Network isolation
- Disabling dangerous imports (`subprocess`, `os.system`, etc.)
How can I resolve this? If you propose a fix, please make it concise.| ''' | ||
| try: | ||
| proc = subprocess.run( | ||
| [sys.executable, "-c", check_script], | ||
| capture_output=True, | ||
| text=True, | ||
| timeout=30, | ||
| ) | ||
|
|
||
| output = proc.stdout + proc.stderr | ||
|
|
||
| if "OK" in proc.stdout: | ||
| warnings = [ | ||
| line.replace("WARNING: ", "") | ||
| for line in proc.stdout.split("\n") | ||
| if line.startswith("WARNING:") | ||
| ] | ||
| return CompilationResult(success=True, warnings=warnings) | ||
| else: | ||
| return CompilationResult(success=False, error=output[:2000]) | ||
|
|
||
| except subprocess.TimeoutExpired: | ||
| return CompilationResult(success=False, error="Compilation timeout (30s)") | ||
| except Exception as e: | ||
| return CompilationResult(success=False, error=str(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Code injection vulnerability. User code is executed with subprocess.run([sys.executable, "-c", check_script]) where check_script contains f-string interpolation of solution_path. If a malicious user could control the path, they could inject code. While solution_path is a Path object created internally, this pattern is risky. Is there any way a user could influence the solution_path through the API to inject malicious code into the f-string?
Prompt To Fix With AI
This is a comment left during a code review.
Path: envs/kernrl/server/evaluator.py
Line: 401:425
Comment:
**logic:** Code injection vulnerability. User code is executed with `subprocess.run([sys.executable, "-c", check_script])` where `check_script` contains f-string interpolation of `solution_path`. If a malicious user could control the path, they could inject code. While `solution_path` is a `Path` object created internally, this pattern is risky. Is there any way a user could influence the `solution_path` through the API to inject malicious code into the f-string?
How can I resolve this? If you propose a fix, please make it concise.- Unify reward calculation: evaluator now uses same logic as environment - Compilation failure: -0.5 penalty - Correctness failure: -0.25 penalty - Slower than baseline: (speedup - 1.0) * 0.5 (negative) - Faster than baseline: min(speedup - 1.0, 2.0) (positive, capped) - Key: positive reward ONLY when speedup > 1.0x baseline - Fix enable_ncu default mismatch: align to False (NCU is slow, opt-in) - Update README with correct reward documentation and examples - Add security considerations section to README Addresses feedback from greptile automated code review. Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Thanks @greptile-apps for the thorough review! Fixed in commit e41aee4: Critical - Reward Inconsistency: Fixed. Unified reward logic - evaluator now uses the same calculation as the environment:
Key principle: Positive reward only when speedup > 1.0x baseline. Compilation + correctness alone does NOT give positive reward. Moderate - Config Mismatch: Fixed. Aligned Security Concerns: Added Security Considerations section to README documenting that user-submitted code executes with full privileges within the container. Updated README with correct reward documentation including concrete examples. |
|
Excellent work! 🎯 I've verified that commit e41aee4 successfully addresses all the critical issues I identified: ✅ Critical - Reward Inconsistency: FIXEDThe reward logic is now perfectly unified between evaluator and environment: def _compute_reward(self, result: EvalResult) -> float:
# Compilation failure: penalty
if not result.compilation.success:
return -0.5
# Correctness failure: penalty
if result.correctness and not result.correctness.correct:
return -0.25
# Benchmark available: reward based on speedup
if result.benchmark and result.benchmark.speedup:
speedup = result.benchmark.speedup
if speedup > 1.0:
# Positive reward only when beating baseline, capped at 2.0
return min(speedup - 1.0, 2.0)
else:
# Small penalty for being slower than baseline
return (speedup - 1.0) * 0.5
# Correct but no benchmark (edge case): no reward
return 0.0The key principle is now crystal clear: positive reward ONLY when speedup > 1.0x baseline. Perfect! 👌 ✅ Moderate - Config Mismatch: FIXEDBoth ✅ Security Concerns: ADDRESSEDThe README now includes a comprehensive Security Considerations section that clearly documents the execution model and potential risks. ✅ Documentation: ENHANCEDThe README reward structure table with concrete examples is excellent - it makes the reward logic completely transparent to users. This is a thorough and well-executed fix that maintains the environment's integrity while addressing all identified concerns. The unified reward calculation ensures consistent agent training across both local and server-based evaluation modes. |
|
Hey @Infatoshi , Are we ready to go on this right? |
|
Yup! |
Summary
kernrl- an RL environment for training LLM agents to write optimized GPU kernelsEnvironment Interface
Features
Test plan
uvicorn kernrl.server.app:app🤖 Generated with Claude Code