Add kernrl: GPU kernel optimization environment #308

Infatoshi · 2026-01-20T12:10:27Z

Summary

Adds kernrl - an RL environment for training LLM agents to write optimized GPU kernels
Agents receive PyTorch reference implementations and must write faster CUDA/Triton kernels
89 problems across 10 difficulty levels (matmul, attention, MoE, cryptography, etc.)

Environment Interface

Action: CUDA/Triton kernel code
Observation: Compilation status, correctness check, speedup measurement, profiling data
Reward: +0.1 compile, +0.3 correct, +0.3 beats baseline, +0.3 scaled by log2(speedup)

Features

Local GPU evaluation with NSight Systems/Compute profiling
Docker support for isolated execution
Compatible with TRL/GRPO training via rollout_func pattern

Test plan

Verify server starts: uvicorn kernrl.server.app:app
Test reset/step cycle with sample kernel
Verify correctness checking against reference
Benchmark timing measurement works

🤖 Generated with Claude Code

kernrl is an RL environment for training LLM agents to write optimized CUDA/Triton GPU kernels. Agents receive a PyTorch reference implementation and must write a kernel that produces the same output faster. Features: - 89 problems across 10 difficulty levels - Comprehensive profiling (NSight Systems/Compute, torch.profiler) - Correctness verification with configurable tolerances - Benchmark timing for speedup measurement - Rich feedback for iterative optimization Problem levels: - L1: Simple operators (matmul, softmax, conv, norms) - L2: Fused operations (matmul+activation chains) - L3: Single blocks (attention, transformer block) - L4: Novel layers (MLA, MoE, GQA, FP8, INT4) - L5-L10: Scientific computing, graphics, signal processing, video processing, parallel primitives, cryptography Requires: NVIDIA GPU with CUDA toolkit, PyTorch, Triton Co-Authored-By: Claude Opus 4.5 <[email protected]>

meta-cla · 2026-01-20T12:10:34Z

Hi @Infatoshi!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

burtenshaw · 2026-01-20T12:44:36Z

Hi @Infatoshi

Thanks for the PR. From a high level, here are some points we should work on first:

you will need to resolve the meta CLA registration.
the space is in an error state right now: https://huggingface.co/spaces/Infatoshi/kernrl
could you add an inference example to examples/ that uses the OpenAI client to interact with the environment.
could you update docs/environments.md to include this environment.

Once again, cool project!

Infatoshi · 2026-01-20T13:13:41Z

Thanks @burtenshaw! All addressed:

CLA: Signed
HF Space: Fixed and running - https://huggingface.co/spaces/Infatoshi/kernrl
Inference example: Added examples/kernrl_inference.py
Docs: Added kernrl to docs/environments.md

Ready for review!

burtenshaw · 2026-01-20T15:52:40Z

Thanks @Infatoshi . This looks great. One thing concerns me though. The environment does not conform to the openenv structure created by the CLI. And within this repo, we want all the envs to conform to this structure because they should serve as 'examples'. Two options:

Could you restructure the env following this guide and using openenv init so that it is compatible with envs like echo_env etc?

This will add package management, build and push support, and make it easier to build on top of.

- Add try/except import pattern for in-repo vs standalone compatibility - Rename kernel_env.py to kernrl_environment.py (follow naming convention) - Update openenv.yaml to match spec_version 1 format - Update pyproject.toml with openenv-core git dependency - Update Dockerfile to use openenv-base multi-stage build pattern - Update __init__.py exports for both package and server This makes kernrl compatible with `openenv init` structure and other environments like echo_env, enabling CLI support for build/push/serve. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Infatoshi · 2026-01-20T19:03:35Z

Restructured kernrl to match the openenv CLI structure! Changes in commit 5f24f37:

Structure changes:

Added try/except import pattern for in-repo vs standalone compatibility (matching echo_env pattern)
Renamed kernel_env.py to kernrl_environment.py (following naming convention)
Updated openenv.yaml to use spec_version 1 format with proper type/runtime/app/port fields
Updated pyproject.toml with openenv-core @ git+... dependency pattern
Updated Dockerfile to use the openenv-base multi-stage build pattern
Updated __init__.py exports for both package and server modules

The environment now follows the same structure as echo_env and should be compatible with openenv init, openenv build, openenv serve, etc.

Let me know if anything else needs adjustment!

zkwentz · 2026-01-21T14:40:15Z

@greptile

greptile-apps · 2026-01-21T14:47:07Z

Greptile Summary

Adds kernrl, a GPU kernel optimization environment that trains LLM agents to write fast CUDA/Triton kernels through real hardware feedback. Agents receive PyTorch reference implementations and iteratively optimize kernels based on compilation status, correctness checks, and performance benchmarks.

Key Features

89 problems across 10 difficulty levels (basic ops, attention, MoE, cryptography, signal processing)
Local GPU evaluation with comprehensive profiling (NSight Systems/Compute, Compute Sanitizer)
Compatible with TRL/GRPO training patterns via rollout_func
Docker support for isolated execution

Issues Found

Critical - Reward Inconsistency: Three different reward calculations exist:

README documents one structure (compile +0.1, correct +0.3, beats baseline +0.3, speedup bonus +0.3)
evaluator._compute_reward() implements a different structure
kernrl_environment._calculate_reward() implements a third version that's never used (dead code)

The actual reward comes from evaluator._compute_reward() (called at evaluator.py:313), making the environment's method dead code.

Moderate - Configuration Mismatch: enable_ncu defaults to False in KernelOptEnvironment.__init__ but True in LocalGPUEvaluator.__init__, creating inconsistent profiling behavior.

Security Concerns: User-submitted kernel code executes with full Python privileges via dynamic imports. While Docker provides isolation, there's no sandboxing within the container for:

Filesystem access
Network requests
Resource consumption
Module imports

This is acceptable for trusted research environments but should be documented as a security consideration.

Architecture

The environment follows OpenEnv patterns correctly:

Clean separation between client and server
WebSocket-based persistent sessions
Proper Pydantic models
Docker containerization
Comprehensive profiling feedback for agent learning

Confidence Score: 3/5

Safe for research use but has critical reward logic bugs and configuration inconsistencies that need fixing
Score reflects critical reward calculation bug (documented vs implemented), dead code in reward function, and configuration mismatches between components. Security concerns are acceptable for research but should be documented. Core architecture is solid.
Pay close attention to envs/kernrl/README.md, envs/kernrl/server/evaluator.py, and envs/kernrl/server/kernrl_environment.py for reward logic unification

Important Files Changed

Filename	Overview
envs/kernrl/README.md	Documentation issue: Reward structure in README doesn't match implementation in evaluator.py
envs/kernrl/server/evaluator.py	Core evaluator with comprehensive profiling - implements different reward than documented
envs/kernrl/server/kernrl_environment.py	Main environment implementation - contains duplicate reward calculations that don't match

Sequence Diagram

sequenceDiagram
    participant Agent as LLM Agent
    participant Client as kernrl_env (Client)
    participant Server as FastAPI Server
    participant Env as KernelOptEnvironment
    participant Eval as LocalGPUEvaluator
    participant GPU as GPU Hardware
    
    Agent->>Client: reset(problem_id)
    Client->>Server: WebSocket: reset
    Server->>Env: reset(problem_id)
    Env->>Env: Load problem from problems/
    Env-->>Server: KernelObservation (description, reference_code)
    Server-->>Client: JSON response
    Client-->>Agent: KernelObservation
    
    loop Until solved or max_turns
        Agent->>Agent: Generate optimized kernel code
        Agent->>Client: step(KernelAction(code))
        Client->>Server: WebSocket: step
        Server->>Env: step(action)
        Env->>Eval: evaluate(solution_code, reference_code)
        
        Eval->>Eval: Write code to tmpdir
        Eval->>GPU: Compile solution (subprocess)
        GPU-->>Eval: Compilation result
        
        alt Compilation succeeds
            Eval->>GPU: Run Compute Sanitizer
            GPU-->>Eval: Memory/race errors
            
            Eval->>GPU: Execute correctness check
            GPU-->>Eval: Output diff statistics
            
            alt Correctness passes
                Eval->>GPU: Run benchmark (warmup + timing)
                GPU-->>Eval: Speedup measurement
                
                Eval->>GPU: NSight Systems profiling
                GPU-->>Eval: System-level metrics
                
                Eval->>GPU: NSight Compute profiling
                GPU-->>Eval: Kernel-level metrics
            end
        end
        
        Eval->>Eval: _compute_reward(result)
        Eval-->>Env: EvalResult (compilation, correctness, benchmark, profiling)
        Env->>Env: Update state (best_speedup, solved)
        Env-->>Server: KernelObservation (feedback, reward, done)
        Server-->>Client: JSON response
        Client-->>Agent: StepResult[KernelObservation]
        Agent->>Agent: Analyze feedback, adjust kernel
    end

greptile-apps

_{103 files reviewed, 5 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-21T14:47:04Z

envs/kernrl/README.md

+## Reward Structure
+
+| Component | Reward | Description |
+|-----------|--------|-------------|
+| Compilation | +0.1 | Code compiles successfully |
+| Correctness | +0.3 | Output matches reference |
+| Beats baseline | +0.3 | Speedup > 1.0x |
+| Speedup bonus | +0.3 | Scales with log2(speedup) |


logic: Reward structure doesn't match implementation.

README says:

Compilation: +0.1

Correctness: +0.3

Beats baseline: +0.3

Speedup bonus: +0.3 (log2 scaled)

But evaluator.py:690-715 implements:

Compilation: +0.1

Correctness: +0.3

Speedup > 1.0: +0.3 + bonus (log2 scaled, capped at 0.3)

And kernrl_environment.py:324-340 implements a THIRD version:

Compilation fail: -0.5

Incorrect: -0.25

Speedup > 1.0: min(speedup - 1.0, 2.0)

Speedup < 1.0: (speedup - 1.0) * 0.5

Need to unify the reward logic and update docs accordingly.

Prompt To Fix With AI

This is a comment left during a code review. Path: envs/kernrl/README.md Line: 104:111 Comment: **logic:** Reward structure doesn't match implementation. README says: - Compilation: +0.1 - Correctness: +0.3 - Beats baseline: +0.3 - Speedup bonus: +0.3 (log2 scaled) But `evaluator.py:690-715` implements: - Compilation: +0.1 - Correctness: +0.3 - Speedup > 1.0: +0.3 + bonus (log2 scaled, capped at 0.3) And `kernrl_environment.py:324-340` implements a THIRD version: - Compilation fail: -0.5 - Incorrect: -0.25 - Speedup > 1.0: `min(speedup - 1.0, 2.0)` - Speedup < 1.0: `(speedup - 1.0) * 0.5` Need to unify the reward logic and update docs accordingly. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:47:04Z

envs/kernrl/server/kernrl_environment.py

+    def _calculate_reward(self, eval_result) -> float:
+        """Calculate reward based on evaluation results."""
+        if not eval_result.compilation.success:
+            return -0.5  # Penalty for compilation failure
+
+        if eval_result.correctness and not eval_result.correctness.correct:
+            return -0.25  # Penalty for incorrect output
+
+        if eval_result.benchmark and eval_result.benchmark.speedup:
+            # Reward proportional to speedup
+            speedup = eval_result.benchmark.speedup
+            if speedup > 1.0:
+                return min(speedup - 1.0, 2.0)  # Cap reward at 2.0
+            else:
+                return (speedup - 1.0) * 0.5  # Smaller penalty for being slower
+
+        return 0.0


logic: Duplicate reward calculation. The _calculate_reward method here is overridden by evaluator._compute_reward() at line 313 of evaluator.py, making this implementation dead code. Remove this method or clarify which reward function should be used.

Prompt To Fix With AI

This is a comment left during a code review. Path: envs/kernrl/server/kernrl_environment.py Line: 324:340 Comment: **logic:** Duplicate reward calculation. The `_calculate_reward` method here is overridden by `evaluator._compute_reward()` at line 313 of `evaluator.py`, making this implementation dead code. Remove this method or clarify which reward function should be used. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:47:05Z

envs/kernrl/server/evaluator.py

+    # Step info
+    step: int = 0


logic: Enable flags default to True but the KernelOptEnvironment constructor at kernrl_environment.py:79-80 defaults enable_nsys=True, enable_ncu=False. The evaluator then enables NCU by default at line 212, creating inconsistency.

Prompt To Fix With AI

This is a comment left during a code review. Path: envs/kernrl/server/evaluator.py Line: 79:80 Comment: **logic:** Enable flags default to `True` but the `KernelOptEnvironment` constructor at `kernrl_environment.py:79-80` defaults `enable_nsys=True, enable_ncu=False`. The evaluator then enables NCU by default at line 212, creating inconsistency. How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:47:06Z

envs/kernrl/server/evaluator.py

+        runner_path.write_text(f'''
+import torch
+import importlib.util
+
+def load_module(path, name):
+    spec = importlib.util.spec_from_file_location(name, path)
+    mod = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(mod)
+    return mod
+
+ref_mod = load_module("{reference_path}", "reference")
+sol_mod = load_module("{solution_path}", "solution")
+
+device = "{self.device}"
+
+if hasattr(ref_mod, "get_init_inputs"):
+    init_inputs = ref_mod.get_init_inputs()
+else:
+    init_inputs = []
+
+model = sol_mod.Model(*init_inputs).to(device).eval()
+
+if hasattr(ref_mod, "get_inputs"):
+    inputs = [x.to(device) if isinstance(x, torch.Tensor) else x for x in ref_mod.get_inputs()]
+else:
+    inputs = [torch.randn(16, 1024, device=device)]
+
+# Warmup
+with torch.no_grad():
+    for _ in range(5):
+        model(*inputs)
+
+torch.cuda.synchronize()
+
+# Profile this
+with torch.no_grad():
+    for _ in range(10):
+        model(*inputs)
+
+torch.cuda.synchronize()
+''')


logic: Arbitrary code execution without sandboxing. The runner script dynamically imports and executes user-submitted kernel code via spec.loader.exec_module(mod). While Docker provides some isolation, the code runs with full Python privileges and can:

Access filesystem

Make network requests

Execute system commands via subprocess

Import any available module

Consider:

Adding resource limits (CPU, memory, time)

Restricting filesystem access to temp directories only

Network isolation

Disabling dangerous imports (subprocess, os.system, etc.)

Prompt To Fix With AI

This is a comment left during a code review. Path: envs/kernrl/server/evaluator.py Line: 325:365 Comment: **logic:** Arbitrary code execution without sandboxing. The runner script dynamically imports and executes user-submitted kernel code via `spec.loader.exec_module(mod)`. While Docker provides some isolation, the code runs with full Python privileges and can: - Access filesystem - Make network requests - Execute system commands via `subprocess` - Import any available module Consider: - Adding resource limits (CPU, memory, time) - Restricting filesystem access to temp directories only - Network isolation - Disabling dangerous imports (`subprocess`, `os.system`, etc.) How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-21T14:47:06Z

envs/kernrl/server/evaluator.py

+'''
+        try:
+            proc = subprocess.run(
+                [sys.executable, "-c", check_script],
+                capture_output=True,
+                text=True,
+                timeout=30,
+            )
+
+            output = proc.stdout + proc.stderr
+
+            if "OK" in proc.stdout:
+                warnings = [
+                    line.replace("WARNING: ", "")
+                    for line in proc.stdout.split("\n")
+                    if line.startswith("WARNING:")
+                ]
+                return CompilationResult(success=True, warnings=warnings)
+            else:
+                return CompilationResult(success=False, error=output[:2000])
+
+        except subprocess.TimeoutExpired:
+            return CompilationResult(success=False, error="Compilation timeout (30s)")
+        except Exception as e:
+            return CompilationResult(success=False, error=str(e))


logic: Code injection vulnerability. User code is executed with subprocess.run([sys.executable, "-c", check_script]) where check_script contains f-string interpolation of solution_path. If a malicious user could control the path, they could inject code. While solution_path is a Path object created internally, this pattern is risky. Is there any way a user could influence the solution_path through the API to inject malicious code into the f-string?

Prompt To Fix With AI

This is a comment left during a code review. Path: envs/kernrl/server/evaluator.py Line: 401:425 Comment: **logic:** Code injection vulnerability. User code is executed with `subprocess.run([sys.executable, "-c", check_script])` where `check_script` contains f-string interpolation of `solution_path`. If a malicious user could control the path, they could inject code. While `solution_path` is a `Path` object created internally, this pattern is risky. Is there any way a user could influence the `solution_path` through the API to inject malicious code into the f-string? How can I resolve this? If you propose a fix, please make it concise.

- Unify reward calculation: evaluator now uses same logic as environment - Compilation failure: -0.5 penalty - Correctness failure: -0.25 penalty - Slower than baseline: (speedup - 1.0) * 0.5 (negative) - Faster than baseline: min(speedup - 1.0, 2.0) (positive, capped) - Key: positive reward ONLY when speedup > 1.0x baseline - Fix enable_ncu default mismatch: align to False (NCU is slow, opt-in) - Update README with correct reward documentation and examples - Add security considerations section to README Addresses feedback from greptile automated code review. Co-Authored-By: Claude Opus 4.5 <[email protected]>

Infatoshi · 2026-01-26T15:10:23Z

Thanks @greptile-apps for the thorough review! Fixed in commit e41aee4:

Critical - Reward Inconsistency: Fixed. Unified reward logic - evaluator now uses the same calculation as the environment:

Compilation failure: -0.5 penalty
Correctness failure: -0.25 penalty
Slower than baseline: (speedup - 1.0) * 0.5 (negative)
Faster than baseline: min(speedup - 1.0, 2.0) (positive, capped)

Key principle: Positive reward only when speedup > 1.0x baseline. Compilation + correctness alone does NOT give positive reward.

Moderate - Config Mismatch: Fixed. Aligned enable_ncu default to False in both LocalGPUEvaluator and KernelOptEnvironment. NCU is slow, so it's opt-in.

Security Concerns: Added Security Considerations section to README documenting that user-submitted code executes with full privileges within the container.

Updated README with correct reward documentation including concrete examples.

greptile-apps · 2026-01-26T15:11:38Z

Excellent work! 🎯 I've verified that commit e41aee4 successfully addresses all the critical issues I identified:

✅ Critical - Reward Inconsistency: FIXED

The reward logic is now perfectly unified between evaluator and environment:

def _compute_reward(self, result: EvalResult) -> float:
    # Compilation failure: penalty
    if not result.compilation.success:
        return -0.5

    # Correctness failure: penalty  
    if result.correctness and not result.correctness.correct:
        return -0.25

    # Benchmark available: reward based on speedup
    if result.benchmark and result.benchmark.speedup:
        speedup = result.benchmark.speedup
        if speedup > 1.0:
            # Positive reward only when beating baseline, capped at 2.0
            return min(speedup - 1.0, 2.0)
        else:
            # Small penalty for being slower than baseline
            return (speedup - 1.0) * 0.5

    # Correct but no benchmark (edge case): no reward
    return 0.0

The key principle is now crystal clear: positive reward ONLY when speedup > 1.0x baseline. Perfect! 👌

✅ Moderate - Config Mismatch: FIXED

Both LocalGPUEvaluator and the client now default enable_ncu=False, properly reflecting that NCU profiling is slow and should be opt-in.

✅ Security Concerns: ADDRESSED

The README now includes a comprehensive Security Considerations section that clearly documents the execution model and potential risks.

✅ Documentation: ENHANCED

The README reward structure table with concrete examples is excellent - it makes the reward logic completely transparent to users.

This is a thorough and well-executed fix that maintains the environment's integrity while addressing all identified concerns. The unified reward calculation ensures consistent agent training across both local and server-based evaluation modes.

burtenshaw · 2026-01-30T18:45:59Z

Hey @Infatoshi , Are we ready to go on this right?

Infatoshi · 2026-01-31T06:24:31Z

Yup!

Infatoshi mentioned this pull request Jan 20, 2026

Add kernrl blog post: Teaching LLMs to Write Fast GPU Kernels huggingface/blog#3255

Closed

3 tasks

Add kernrl inference example and docs entry

cdaaebc

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 20, 2026

greptile-apps bot reviewed Jan 21, 2026

View reviewed changes

Merge branch 'main' into add-kernrl-env

ce7ee65

burtenshaw self-requested a review January 30, 2026 18:50

Merge branch 'main' into add-kernrl-env

fe9c052

Add kernrl: GPU kernel optimization environment #308

Are you sure you want to change the base?

Add kernrl: GPU kernel optimization environment #308

Uh oh!

Conversation

Infatoshi commented Jan 20, 2026

Summary

Environment Interface

Features

Test plan

Uh oh!

meta-cla bot commented Jan 20, 2026

Action Required

Process

Uh oh!

burtenshaw commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Infatoshi commented Jan 20, 2026

Uh oh!

burtenshaw commented Jan 20, 2026

Uh oh!

Infatoshi commented Jan 20, 2026

Uh oh!

zkwentz commented Jan 21, 2026

Uh oh!

greptile-apps bot commented Jan 21, 2026

Greptile Summary

Key Features

Issues Found

Architecture

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Infatoshi commented Jan 26, 2026

Uh oh!

greptile-apps bot commented Jan 26, 2026

✅ Critical - Reward Inconsistency: FIXED

✅ Moderate - Config Mismatch: FIXED

✅ Security Concerns: ADDRESSED

✅ Documentation: ENHANCED

Uh oh!

burtenshaw commented Jan 30, 2026

Uh oh!

Infatoshi commented Jan 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

burtenshaw commented Jan 20, 2026 •

edited

Loading