Skip to content

Commit 03b4dc5

Browse files
author
Musab
committed
chore: add nvidia skills bundle artifacts
1 parent c4bd014 commit 03b4dc5

3,052 files changed

Lines changed: 681847 additions & 3244 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Evaluation Report
2+
3+
Evaluation of the `accelerated-computing-cudf` skill before publication through NVSkills-Eval.
4+
5+
This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the skill. The goal is to document whether the skill is safe, discoverable, effective, and useful for agents before it is published for broader workflow use.
6+
7+
## Evaluation Summary
8+
9+
- Skill: `accelerated-computing-cudf`
10+
- Evaluation date: 2026-05-29
11+
- NVSkills-Eval profile: `external`
12+
- Environment: `local`
13+
- Dataset: 13 evaluation tasks
14+
- Attempts per task: 2
15+
- Pass threshold: 50%
16+
- Overall verdict: PASS
17+
18+
## Agents Used
19+
20+
- `claude-code`
21+
- `codex`
22+
23+
## Metrics Used
24+
25+
Reported benchmark dimensions:
26+
27+
- Security: checks whether skill-assisted execution avoids unsafe behavior such as secret leakage, destructive commands, or unauthorized access.
28+
- Correctness: checks whether the agent follows the expected workflow and produces the correct final output.
29+
- Discoverability: checks whether the agent loads the skill when relevant and avoids using it when irrelevant.
30+
- Effectiveness: checks whether the agent performs measurably better with the skill than without it.
31+
- Efficiency: checks whether the agent uses fewer tokens and avoids redundant work.
32+
33+
Underlying evaluation signals used in this run:
34+
35+
- `security` (Security): checks for unsafe operations, secret leakage, and unauthorized access.
36+
- `skill_execution` (Skill Execution): verifies that the agent loaded the expected skill and workflow.
37+
- `skill_efficiency` (Efficiency): checks routing quality, decoy avoidance, and redundant tool usage.
38+
- `accuracy` (Accuracy): grades final-answer correctness against the reference answer.
39+
- `goal_accuracy` (Goal Accuracy): checks whether the overall user task completed successfully.
40+
- `behavior_check` (Behavior Check): verifies expected behavior steps, including safety expectations.
41+
- `token_efficiency` (Token Efficiency): compares token usage with and without the skill.
42+
43+
## Test Tasks
44+
45+
The benchmark dataset contained 13 evaluation tasks:
46+
47+
- Positive tasks: 12 tasks where the skill was expected to activate.
48+
- Negative tasks: 1 tasks where no skill was expected.
49+
- Unlabeled tasks: 0 tasks where positive/negative intent could not be inferred.
50+
51+
Task composition is derived from the evaluation dataset when possible. Entries with `expected_skill` set are treated as positive skill-activation cases, while entries with `expected_skill: null` are treated as negative activation cases.
52+
53+
## Results
54+
55+
| Dimension | Num | `claude-code` | `codex` |
56+
|---|---:|---:|---:|
57+
| Security | 8 | 92% (+12%) | 100% (+0%) |
58+
| Correctness | 8 | 96% (+10%) | 92% (+8%) |
59+
| Discoverability | 8 | 84% (+26%) | 68% (+15%) |
60+
| Effectiveness | 8 | 90% (+5%) | 86% (-0%) |
61+
| Efficiency | 8 | 61% (+24%) | 50% (+10%) |
62+
63+
Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.
64+
65+
## Tier 1: Static Validation Summary
66+
67+
Tier 1 validation passed with observations. NVSkills-Eval ran 9 checks and found 7 total findings.
68+
69+
Top findings:
70+
71+
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Instructions' (`skills/accelerated-computing-cudf/SKILL.md`)
72+
- MEDIUM SCHEMA/body_recommended_section: Missing recommended section: '## Examples' (`skills/accelerated-computing-cudf/SKILL.md`)
73+
- LOW QUALITY/quality_discoverability: Broad description without negative triggers may cause over-triggering (`skills/accelerated-computing-cudf/SKILL.md`)
74+
- LOW QUALITY/quality_discoverability: No '## Purpose' section (`skills/accelerated-computing-cudf/SKILL.md`)
75+
- LOW QUALITY/quality_reliability: No prerequisites/requirements documented (`skills/accelerated-computing-cudf/SKILL.md`)
76+
77+
## Tier 2: Deduplication Summary
78+
79+
Tier 2 validation passed. NVSkills-Eval ran 2 checks and found 0 total findings.
80+
81+
Notable observations:
82+
83+
- Context Deduplication: Collected 4 file(s)
84+
- Inter-Skill Deduplication: Parsed skill 'accelerated-computing-cudf': 190 char description
85+
86+
## Publication Recommendation
87+
88+
The skill is suitable to proceed toward NVSkills-Eval publication based on this benchmark. Skill owners should keep this file with the skill and refresh it when the evaluation dataset, skill behavior, or target agents materially change.
Lines changed: 203 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,203 @@
1+
---
2+
name: accelerated-computing-cudf
3+
description: Official NVIDIA-authored guidance for NVIDIA cuDF GPU DataFrames, pandas acceleration, dask-cuDF, ETL, joins, groupby, CSV/Parquet I/O, nullable semantics, and multi-GPU DataFrame workloads.
4+
license: CC-BY-4.0 AND Apache-2.0
5+
metadata:
6+
author: NVIDIA
7+
tags:
8+
- cudf
9+
- dataframes
10+
- pandas
11+
- dask-cudf
12+
- etl
13+
---
14+
15+
# cuDF & dask-cuDF Implementer's Guide
16+
17+
## Compatibility
18+
19+
- Release tracked by this skill: 26.04.
20+
- Requires NVIDIA Volta or newer on CUDA 12, or Turing or newer on CUDA 13. Release 26.04 supports CUDA 12.2-12.9 with driver 535+ or CUDA 13.0-13.1 with driver 580+, and Python 3.11-3.14. cuDF sweet spot: >100K rows.
21+
22+
## Naming
23+
24+
Use NVIDIA library-first wording in user-facing answers. Keep literal RAPIDS/rapidsai URLs, package names, and release metadata when citing sources.
25+
26+
## Role
27+
28+
You are a cuDF expert helping an implementer work with GPU DataFrames. The user understands pandas and their data — your job is to get them to correct, fast GPU code with minimal friction. Choose the path from the user's intent: `cudf.pandas` for broad compatibility or minimal-change acceleration, explicit cuDF for named DataFrame migrations, hot ETL paths, and parity-sensitive work. Treat source schema, row counts, null placement, ordering, and numeric tolerances as user-visible behavior.
29+
30+
## Critical Rules
31+
32+
1. **Choose the right cuDF path.** Use `cudf.pandas` for broad compatibility or minimal-change acceleration. Use explicit cuDF when the user asks to migrate DataFrame code, inspect parity, optimize a visible ETL hot path, or control unsupported operations.
33+
2. **Size gate: 100K rows minimum.** Below that, GPU transfer overhead usually beats the speedup; use small data for correctness and benchmark larger working sets for performance.
34+
3. **Keep conversions at boundaries.** Use `.to_pandas()`, `.values`, or `.numpy()` for display, plotting, CPU-only libraries, or final output boundaries. Keep intermediate ETL data on GPU.
35+
4. **Float32 is your friend.** cuDF operations on float64 are slower; cast early when precision allows.
36+
5. **Validate semantics on representative slices.** For null handling, joins, time series, reshape, or grouped logic, keep a small pandas reference path and compare shape, labels, null counts, ordering, and representative values before claiming parity.
37+
6. **For data > GPU memory**, move to dask-cuDF with `enable_cudf_spill=True`. See `references/dask-cudf-patterns.md`.
38+
39+
## Three Paths to GPU DataFrames
40+
41+
### Path 1: cudf.pandas Accelerator (Compatibility / Minimal Change)
42+
43+
Use when the user needs a small code change, third-party pandas compatibility,
44+
or one code path that can keep running while unsupported operations fall back.
45+
46+
**Jupyter/IPython:**
47+
```python
48+
%load_ext cudf.pandas
49+
import pandas as pd # now GPU-backed; falls back silently for unsupported ops
50+
```
51+
52+
**Script:**
53+
```bash
54+
python -m cudf.pandas my_script.py
55+
```
56+
57+
**With multiprocessing:**
58+
```python
59+
import cudf.pandas
60+
cudf.pandas.install() # must come BEFORE pandas import, before Pool creation
61+
from multiprocessing import Pool
62+
```
63+
64+
Confirm acceleration with the cudf.pandas profiler before claiming speedup.
65+
For notebook, CLI, and stats examples, read
66+
`references/cudf-pandas-accelerator.md`. If the profile shows the hot path
67+
running on CPU, use Path 2 for explicit cuDF control.
68+
69+
### Path 2: Explicit cuDF API
70+
71+
For full control, hot-path optimization, named DataFrame migrations, and
72+
parity-sensitive operations:
73+
74+
```python
75+
import cudf
76+
77+
# Read data directly to GPU
78+
df = cudf.read_parquet("data.parquet")
79+
80+
# Operations mirror pandas
81+
result = df.groupby("key")["value"].sum()
82+
merged = df.merge(lookup, on="id", how="left")
83+
filtered = df[df["amount"] > 1000]
84+
85+
# String operations
86+
df["clean"] = df["name"].str.strip().str.lower()
87+
88+
# To check API coverage before committing to migration:
89+
# See references/api-patterns.md for known gaps and workarounds
90+
```
91+
92+
**Keep data on GPU end-to-end.** Only call `.to_pandas()` at the very end for display or CPU or non-GPU handoff.
93+
94+
Prefer explicit cuDF for tasks involving `read_csv`/`read_parquet`, joins,
95+
groupby, reshape, nullable types, `fillna`/`where`, time buckets, rolling
96+
windows, or CPU/GPU parity checks. Add a small CPU/GPU validation path when
97+
semantics matter instead of relying on successful execution alone.
98+
99+
For pandas code with null handling, reshape, or time-series behavior, read
100+
`references/api-patterns.md` for the relevant semantic checklist before
101+
rewriting. A `cudf.pandas` bootstrap is enough for a minimal-change request; an
102+
implementation request should make the hot path explicit and observable.
103+
104+
For reshape-heavy pandas code (`pivot_table`, `melt`, `stack`/`unstack`,
105+
`crosstab`), keep the source schema as part of the contract: index labels,
106+
column labels or levels, `fill_value`, `aggfunc`, margins, and normalization.
107+
Use explicit cuDF where the equivalent is supported; use `cudf.pandas` or a
108+
narrow compatibility boundary when exact pandas reshape semantics matter more
109+
than rewriting every operation. Add a small pandas-reference parity check for
110+
shape, labels, and representative values before finalizing. See
111+
`references/api-patterns.md`.
112+
113+
### Path 3: dask-cuDF (Multi-GPU / Large Data)
114+
115+
When dataset exceeds GPU memory. See `references/dask-cudf-patterns.md` for full patterns.
116+
117+
```python
118+
from dask_cuda import LocalCUDACluster
119+
from dask.distributed import Client
120+
import dask_cudf
121+
122+
cluster = LocalCUDACluster(enable_cudf_spill=True) # one worker per GPU
123+
client = Client(cluster)
124+
125+
ddf = dask_cudf.read_parquet("s3://bucket/data/*.parquet")
126+
result = ddf.groupby("key").agg({"value": "sum"}).compute()
127+
```
128+
129+
## Memory Management
130+
131+
**Enable spill before OOM happens** (not after):
132+
```python
133+
import cudf
134+
cudf.set_option("spill", True) # spill to host RAM when GPU is full
135+
```
136+
137+
**RMM pool allocator** (reduces cudaMalloc overhead in pipelines with many allocations):
138+
```python
139+
import rmm
140+
rmm.set_current_device_resource(rmm.mr.CudaAsyncMemoryResource())
141+
# Must be called BEFORE any cuDF operations
142+
```
143+
144+
| GPU Free vs Dataset | Strategy |
145+
|---|---|
146+
| Free > 2× dataset | Single GPU cuDF |
147+
| Free 1–2× dataset | cuDF + `cudf.set_option("spill", True)` |
148+
| Dataset > GPU mem | dask-cuDF |
149+
| Dataset > node mem | dask-cuDF + multi-node (see accelerated-computing-mpf) |
150+
151+
## Troubleshooting
152+
153+
**No speedup vs pandas:**
154+
- Data < 100K rows? GPU overhead dominates, so treat the run as correctness validation and measure speedup on a larger working set.
155+
- Run `%%cudf.pandas.profile` — high CPU % means many fallbacks. Identify and fix those ops.
156+
- Check `references/api-patterns.md` for known gaps.
157+
158+
**OOM (CUDA out of memory):**
159+
1. Enable spill: `cudf.set_option("spill", True)`
160+
2. If allocator fragmentation or repeated allocation overhead is visible, use the `accelerated-computing-rmm` memory-resource setup guidance before GPU allocations
161+
3. Still failing: move to dask-cuDF
162+
163+
**AttributeError / NotImplementedError:**
164+
- Check `references/api-patterns.md` for the specific operation
165+
- Keep that one operation on CPU at a narrow boundary and continue the supported pipeline on GPU
166+
- Use `.to_pandas()` only for the unsupported op, then `.from_pandas()` back
167+
168+
**Wrong results vs pandas:**
169+
- Null/NaN handling differs: cuDF uses `<NA>` (nullable) by default, pandas uses `NaN`. See `references/api-patterns.md`.
170+
- Sort stability: cuDF sort is not guaranteed stable unless `stable=True` is passed
171+
- If the difference is due to floating point differences, try casting to higher precision floats (e.g. `float64` instead of `float32`). If the results are still different, stop. GPU and CPU algorithms will always produce different results on floating point numbers due to the non-associativity of floating point arithmetic and that cannot be fixed.
172+
173+
## Nullable and Fill Semantics
174+
175+
When the user explicitly cares about pandas nullable dtypes, `fillna`,
176+
`where`/`mask`, or grouped null behavior, treat parity checks as part of the
177+
implementation. See `references/api-patterns.md` for nullable dtype examples.
178+
179+
- Preserve nullable integer/string columns instead of filling them with sentinel
180+
values unless the source code already did that.
181+
- Keep `where`/`mask` semantics when they encode a condition. Use broad
182+
`fillna` only when the condition is exactly null-only.
183+
- Compare with `to_pandas(nullable=True)` when the pandas reference uses
184+
nullable extension dtypes.
185+
- Put the parity check in a reusable helper next to the GPU path, so future
186+
changes exercise the same nullable conversion and aggregation checks.
187+
- Validate row counts, null counts, mask truth tables, grouped aggregates, and
188+
representative dtypes before claiming semantic parity.
189+
190+
## Reference Files
191+
192+
- `references/cudf-pandas-accelerator.md` — Profiling, fallback detection, cudf.pandas deep dive
193+
- `references/api-patterns.md` — Known API gaps, workarounds, semantic differences
194+
- `references/dask-cudf-patterns.md` — Multi-GPU patterns, best practices, partition tuning
195+
196+
## External Documentation
197+
198+
Use WebFetch to retrieve detailed API signatures, parameter descriptions, and examples on demand.
199+
200+
- **cuDF Documentation:** https://docs.rapids.ai/api/cudf/stable/
201+
- **dask-cuDF API Reference:** https://docs.rapids.ai/api/dask-cudf/stable/api/
202+
- **GitHub:** https://github.com/rapidsai/cudf
203+
- **CHANGELOG:** https://github.com/rapidsai/cudf/blob/main/CHANGELOG.md

0 commit comments

Comments
 (0)