Skip to content

Commit 33b632f

Browse files
committed
Merge branch 'ershi/release-1.10.1-rc1' into 'release-1.10'
Changes for v1.10.1rc1 See merge request omniverse/warp!1790
2 parents c19d0de + 87c870d commit 33b632f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+3337
-712
lines changed

.gitlab-ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,7 @@ linux-aarch64 test:
355355
image: ubuntu:22.04
356356
needs: [linux-aarch64 build]
357357
extends:
358-
- .runner-test-linux-aarch64-gpu
358+
- .runner-test-linux-aarch64 # TODO: Change to .runner-test-linux-aarch64-gpu when runners are available
359359
- .test_common_with_coverage
360360
before_script:
361361
- echo -e "\\e[0Ksection_start:`date +%s`:install_dependencies[collapsed=true]\\r\\e[0KInstalling dependencies"
@@ -457,7 +457,7 @@ linux-x86_64-blackwell test:
457457
- uv venv
458458
- source .venv/bin/activate
459459
- uv sync --extra dev
460-
- uv pip install -U --pre torch --index-url https://download.pytorch.org/whl/nightly/cu130
460+
- uv pip install -U torch --index-url https://download.pytorch.org/whl/cu130
461461
- uv pip install -U "jax[cuda13]"
462462
- uv pip install -e .
463463
- echo -e "\\e[0Ksection_end:`date +%s`:install_dependencies\\r\\e[0K"

.gitlab/ci/clang-build-and-test.yml

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,14 @@ linux-x86_64 build:
6161
- mv warp/bin/warp.so warp/bin/linux-x86_64
6262
- mv warp/bin/warp-clang.so warp/bin/linux-x86_64
6363

64+
linux-x86_64 build cuda 13:
65+
image: gitlab-master.nvidia.com:5005/omniverse/warp/cuda:13.0.1-devel-ubuntu24.04
66+
extends:
67+
- .build_linux_base
68+
- .ipp_lnx_x86_64_cpu_medium
69+
script:
70+
- uv run build_lib.py --clang_build_toolchain
71+
6472
# ==============================================================================
6573
# Unit Testing Jobs
6674
#

CHANGELOG.md

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# Changelog
22

3+
## [1.10.1] - 2025-12-01
4+
5+
### Fixed
6+
7+
- Fix type inference errors when passing reference arguments (such as array elements) to built-in functions
8+
([GH-1071](https://github.com/NVIDIA/warp/issues/1071)).
9+
- Fix `module="unique"` kernels to properly reuse existing module objects when defined multiple times,
10+
avoiding unnecessary module creation overhead ([GH-995](https://github.com/NVIDIA/warp/issues/995)).
11+
- Add validation in `wp.compile_aot_module()` to detect generic kernels without overloads and generic kernels with
12+
multiple overloads when `strip_hash=True` ([GH-919](https://github.com/NVIDIA/warp/issues/919)).
13+
- Fix compilation error in `wp.tile_load_indexed()` when indices tile has been reshaped or transformed
14+
([GH-1008](https://github.com/NVIDIA/warp/issues/1008)).
15+
- Fix multiple issues with kernel-local arrays (arrays created with `wp.zeros()` in kernels):
16+
- Fix `.ptr` access ([GH-999](https://github.com/NVIDIA/warp/issues/999)).
17+
- Fix indexing when requesting a subarray ([GH-1081](https://github.com/NVIDIA/warp/issues/1081)).
18+
- Fix shape parameter to accept a single integer (e.g., `wp.zeros(shape=123, dtype=float)`)
19+
([GH-1081](https://github.com/NVIDIA/warp/issues/1081)).
20+
- Fix code-generation ordering for custom gradient functions (`@wp.func_grad`) when used with nested function calls
21+
([GH-967](https://github.com/NVIDIA/warp/issues/967)).
22+
- Fix invalid reads when using `wp.fem.TemporaryStore` during tape capture for automatic differentiation
23+
([GH-1021](https://github.com/NVIDIA/warp/issues/1021)).
24+
- Fix reference cycles introduced by `wp.fem.Temporary` and `wp.fem.ShapeBasisSpace`
25+
([GH-1076](https://github.com/NVIDIA/warp/issues/1076)).
26+
- Improve documentation and error messages about requiring a BVH for `wp.fem.lookup()` and related functionality
27+
([GH-1072](https://github.com/NVIDIA/warp/issues/1072)).
28+
29+
### Documentation
30+
31+
- Add more examples to the Tiles and SIMT code documentation, demonstrating caveats when switching between
32+
the CPU and GPU and using `wp.tile()` ([GH-1042](https://github.com/NVIDIA/warp/issues/1042)).
33+
334
## [1.10.0] - 2025-11-02
435

536
### Added
@@ -1939,6 +1970,7 @@
19391970

19401971
- Initial publish for alpha testing
19411972

1973+
[1.10.1]: https://github.com/NVIDIA/warp/releases/tag/v1.10.1
19421974
[1.10.0]: https://github.com/NVIDIA/warp/releases/tag/v1.10.0
19431975
[1.9.1]: https://github.com/NVIDIA/warp/releases/tag/v1.9.1
19441976
[1.9.0]: https://github.com/NVIDIA/warp/releases/tag/v1.9.0

PUBLICATIONS.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@ pull request on GitHub or email a link to your arXiv preprint (preferred) or DOI
77

88
## 2025
99

10+
- **NeuSpring: Neural Spring Fields for Reconstruction and Simulation of Deformable Objects from Videos**. *Q. Xu, J. Liu, S. Yu, Y. Wang, Y. Zhou, J. Zhou, J. Cui, Y. Ong, H. Zhang*. November 2025. [arXiv:2511.08310](https://arxiv.org/abs/2511.08310)
11+
- **Improving Long-Range Interactions in Graph Neural Simulators via Hamiltonian Dynamics**. *T. Hoang, A. Trenta, A. Gravina, N. Freymuth, P. Becker, D. Bacciu, G. Neumann*. November 2025. [arXiv:2511.08185](https://arxiv.org/abs/2511.08185)
12+
- **Real-to-Sim Robot Policy Evaluation with Gaussian Splatting Simulation of Soft-Body Interactions**. *K. Zhang, S. Sha, H. Jiang, M. Loper, H. Song, G. Cai, Z. Xu, X. Hu, C. Zheng, Y. Li*. November 2025. [arXiv:2511.04665](https://arxiv.org/abs/2511.04665)
13+
- **Human Mesh Modeling for Anny Body**. *R. Brégier, G. Fiche, L. Bravo-Sánchez, T. Lucas, M. Armando, P. Weinzaepfel, G. Rogez, F. Baradel*. November 2025. [arXiv:2511.03589](https://arxiv.org/abs/2511.03589)
14+
- **VoMP: Predicting Volumetric Mechanical Property Fields**. *R. Dagli, D. Xiang, V. Modi, C. Loop, C. F. Tsang, A. H. Chen, A. Hu, G. State, D. I. W. Levin, M. Shugrina*. October 2025. [arXiv:2510.22975](https://arxiv.org/abs/2510.22975)
1015
- **Learning to Design Soft Hands using Reward Models**. *X. Bai, N. Hansen, A. Singh, M. T. Tolley, Y. Duan, P. Abbeel, X. Wang, S. Yi*. October 2025. [arXiv:2510.17086](https://arxiv.org/abs/2510.17086)
1116
- **Feedback Matters: Augmenting Autonomous Dissection with Visual and Topological Feedback**. *C. Wang, C. Chen, X. Liang, S. Atar, F. Richter, M. Yip*. October 2025. [arXiv:2510.04074](https://arxiv.org/abs/2510.04074)
1217
- **MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics**. *C. Lee, J. Lee, T. Kim*. October 2025. [arXiv:2510.01619](https://arxiv.org/abs/2510.01619)
@@ -15,6 +20,7 @@ pull request on GitHub or email a link to your arXiv preprint (preferred) or DOI
1520
- **MechStyle: Augmenting Generative AI with Mechanical Simulation to Create Stylized and Structurally Viable 3D Models**. *F. Faruqi, A. Abdel-Rahman, L. Tejedor, M. Nisser, J. Li, V. Phadnis, V. Jampani, N. Gershenfeld, M. Hofmann, S. Mueller*. September 2025. [arXiv:2509.20571](https://arxiv.org/abs/2509.20571)
1621
- **AERO-MPPI: Anchor-Guided Ensemble Trajectory Optimization for Agile Mapless Drone Navigation**. *X. Chen, R. Huang, L. Tang, L. Zhao*. September 2025. [arXiv:2509.17340](https://arxiv.org/abs/2509.17340)
1722
- **Discovering neural elastoplasticity from kinematic observations**. *G. B. Gavris, W. Sun*. September 2025. [DOI:10.1073/pnas.2508732122](https://doi.org/10.1073/pnas.2508732122)
23+
1824
- **Learning Simulatable Models of Cloth with Spatially-varying Constitutive Properties**. *G. Chen, S. Suri, Y. Wu, E. Voulga, D. I. W. Levin, D. K. Pai*. July 2025. [arXiv:2507.21288](https://arxiv.org/abs/2507.21288)
1925
- **GeoWarp: An automatically differentiable and GPU-accelerated implicit MPM framework for geomechanics based on NVIDIA Warp**. *Y. Zhao, X. Li, C. Jiang, J. Choo*. July 2025. [arXiv:2507.09435](https://arxiv.org/abs/2507.09435)
2026
- **Transforming Unstructured Hair Strands into Procedural Hair Grooms**. *W. Chang, A. L. Russell, S. Grabli, M. J. Chiang, C. Hery, D. Roble, R. Ramamoorthi, T. Li, O. Maury*. July 2025. [DOI:10.1145/3731168](https://doi.org/10.1145/3731168)

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,9 @@ the `pip install` command, e.g.
4343

4444
| Platform | Install Command |
4545
| --------------- | ----------------------------------------------------------------------------------------------------------------------------- |
46-
| Linux aarch64 | `pip install https://github.com/NVIDIA/warp/releases/download/v1.10.0/warp_lang-1.10.0+cu13-py3-none-manylinux_2_34_aarch64.whl` |
47-
| Linux x86-64 | `pip install https://github.com/NVIDIA/warp/releases/download/v1.10.0/warp_lang-1.10.0+cu13-py3-none-manylinux_2_28_x86_64.whl` |
48-
| Windows x86-64 | `pip install https://github.com/NVIDIA/warp/releases/download/v1.10.0/warp_lang-1.10.0+cu13-py3-none-win_amd64.whl` |
46+
| Linux aarch64 | `pip install https://github.com/NVIDIA/warp/releases/download/v1.10.1/warp_lang-1.10.1+cu13-py3-none-manylinux_2_34_aarch64.whl` |
47+
| Linux x86-64 | `pip install https://github.com/NVIDIA/warp/releases/download/v1.10.1/warp_lang-1.10.1+cu13-py3-none-manylinux_2_28_x86_64.whl` |
48+
| Windows x86-64 | `pip install https://github.com/NVIDIA/warp/releases/download/v1.10.1/warp_lang-1.10.1+cu13-py3-none-win_amd64.whl` |
4949

5050
The `--force-reinstall` option may need to be used to overwrite a previous installation.
5151

VERSION.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
1.10.0
1+
1.10.1rc1

asv/benchmarks/atomics.py

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
"""Benchmarks for atomic operations under high thread contention.
17+
18+
All threads write to a single output location (index 0) to maximize contention
19+
and measure worst-case atomic operation performance.
20+
"""
21+
22+
from typing import Any
23+
24+
import numpy as np
25+
26+
import warp as wp
27+
28+
# Map string parameter names to warp dtypes
29+
DTYPE_MAP = {
30+
"float32": wp.float32,
31+
"int32": wp.int32,
32+
}
33+
34+
NUM_ELEMENTS = 32 * 1024 * 1024
35+
36+
37+
@wp.kernel
38+
def max_kernel(
39+
vals: wp.array(dtype=Any),
40+
out: wp.array(dtype=Any),
41+
):
42+
tid = wp.tid()
43+
val = vals[tid]
44+
wp.atomic_max(out, 0, val) # All threads contend on out[0]
45+
46+
47+
@wp.kernel
48+
def min_kernel(
49+
vals: wp.array(dtype=Any),
50+
out: wp.array(dtype=Any),
51+
):
52+
tid = wp.tid()
53+
val = vals[tid]
54+
wp.atomic_min(out, 0, val) # All threads contend on out[0]
55+
56+
57+
class AtomicMax:
58+
"""Benchmark wp.atomic_max() with high thread contention.
59+
60+
Uses 4x larger arrays (128M elements) to reduce measurement variation,
61+
as atomic_max showed ~10% variation with the default 32M elements.
62+
"""
63+
64+
params = ["float32", "int32"]
65+
param_names = ["dtype"]
66+
67+
repeat = 50
68+
number = 15
69+
70+
# Use 4x more elements to reduce measurement variation
71+
num_elements = 4 * NUM_ELEMENTS
72+
73+
def setup_cache(self):
74+
rng = np.random.default_rng(42)
75+
# Generate vals_np for each dtype in DTYPE_MAP
76+
vals_np_dict = {}
77+
for dtype_str_key, dtype in DTYPE_MAP.items():
78+
if dtype == wp.float32:
79+
vals_np = rng.random(self.num_elements).astype(np.float32)
80+
elif dtype == wp.int32:
81+
vals_np = rng.integers(0, 2**31 - 1, size=self.num_elements, dtype=np.int32)
82+
else:
83+
vals_np = None
84+
vals_np_dict[dtype_str_key] = vals_np
85+
86+
return vals_np_dict
87+
88+
def setup(self, vals_np_dict, dtype_str):
89+
wp.init()
90+
self.device = wp.get_device("cuda:0")
91+
92+
dtype = DTYPE_MAP[dtype_str]
93+
94+
self.vals = wp.array(vals_np_dict[dtype_str], dtype=dtype, device=self.device)
95+
self.out = wp.zeros(shape=(1,), dtype=dtype, device=self.device)
96+
97+
self.cmd = wp.launch(
98+
max_kernel,
99+
(self.num_elements,),
100+
inputs=[self.vals],
101+
outputs=[self.out],
102+
device=self.device,
103+
record_cmd=True,
104+
)
105+
106+
# Launch once to compile
107+
self.cmd.launch()
108+
wp.synchronize_device(self.device)
109+
110+
def time_cuda(self, vals_np_dict, dtype_str):
111+
self.out.zero_()
112+
self.cmd.launch()
113+
wp.synchronize_device(self.device)
114+
115+
116+
class AtomicMin:
117+
"""Benchmark wp.atomic_min() with high thread contention.
118+
119+
Uses standard array size (32M elements) as measurements are already stable.
120+
"""
121+
122+
params = ["float32", "int32"]
123+
param_names = ["dtype"]
124+
125+
repeat = 100
126+
number = 25
127+
128+
def setup_cache(self):
129+
rng = np.random.default_rng(42)
130+
# Generate vals_np for each dtype in DTYPE_MAP
131+
vals_np_dict = {}
132+
for dtype_str_key, dtype in DTYPE_MAP.items():
133+
if dtype == wp.float32:
134+
vals_np = rng.random(NUM_ELEMENTS).astype(np.float32)
135+
elif dtype == wp.int32:
136+
vals_np = rng.integers(0, 2**31 - 1, size=NUM_ELEMENTS, dtype=np.int32)
137+
else:
138+
vals_np = None
139+
vals_np_dict[dtype_str_key] = vals_np
140+
141+
return vals_np_dict
142+
143+
def setup(self, vals_np_dict, dtype_str):
144+
wp.init()
145+
self.device = wp.get_device("cuda:0")
146+
147+
dtype = DTYPE_MAP[dtype_str]
148+
149+
self.vals = wp.array(vals_np_dict[dtype_str], dtype=dtype, device=self.device)
150+
self.out = wp.zeros(shape=(1,), dtype=dtype, device=self.device)
151+
152+
self.cmd = wp.launch(
153+
min_kernel,
154+
(NUM_ELEMENTS,),
155+
inputs=[self.vals],
156+
outputs=[self.out],
157+
device=self.device,
158+
record_cmd=True,
159+
)
160+
161+
# Launch once to compile
162+
self.cmd.launch()
163+
wp.synchronize_device(self.device)
164+
165+
def time_cuda(self, vals_np_dict, dtype_str):
166+
self.out.zero_()
167+
self.cmd.launch()
168+
wp.synchronize_device(self.device)

build_lib.py

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,11 +31,11 @@
3131
import glob
3232
import os
3333
import platform
34-
import re
3534
import shutil
3635
import subprocess
3736
import sys
3837

38+
import build_llvm
3939
import warp._src.build_dll as build_dll
4040
import warp._src.config as config
4141
from warp._src.context import export_builtins
@@ -355,6 +355,14 @@ def main(argv: list[str] | None = None) -> int:
355355
# propagate verbosity to build subsystem
356356
build_dll.verbose_cmd = args.verbose
357357

358+
# check LLVM build dependencies early if --build_llvm is set
359+
if args.build_llvm:
360+
try:
361+
build_llvm.check_build_dependencies(verbose=args.verbose)
362+
except RuntimeError as e:
363+
print(f"Warp build error: {e}")
364+
return 1
365+
358366
# setup CUDA Toolkit path
359367
if platform.system() == "Darwin":
360368
args.cuda_path = None
@@ -382,6 +390,11 @@ def main(argv: list[str] | None = None) -> int:
382390
if not args.host_compiler:
383391
print("Warp build error: Could not find MSVC compiler")
384392
return 1
393+
else:
394+
args.host_compiler = build_dll.find_host_compiler()
395+
if not args.host_compiler:
396+
print("Warp build error: Could not find C++ compiler")
397+
return 1
385398

386399
try:
387400
# Handle CI nightly builds (returns updated version string if triggered, else None)
@@ -392,17 +405,6 @@ def main(argv: list[str] | None = None) -> int:
392405
else:
393406
build_version = config.version
394407

395-
# Reset git hash to None for non-scheduled builds (keeps config clean for local dev)
396-
if nightly_version is None:
397-
config_file = os.path.join(base_path, "warp", "_src", "config.py")
398-
with open(config_file) as f:
399-
content = f.read()
400-
# Reset _git_commit_hash to None
401-
pattern = r'^(_git_commit_hash\s*:\s*Optional\[str\]\s*=\s*)(None|"[^"]*")(.*)$'
402-
updated_content = re.sub(pattern, r"\g<1>None\g<3>", content, flags=re.MULTILINE)
403-
with open(config_file, "w") as f:
404-
f.write(updated_content)
405-
406408
if args.verbose:
407409
print(f"Building Warp version {build_version}")
408410

@@ -457,8 +459,6 @@ def main(argv: list[str] | None = None) -> int:
457459

458460
# build warp-clang.dll
459461
if args.standalone:
460-
import build_llvm
461-
462462
if args.build_llvm:
463463
build_llvm.build_llvm_clang_from_source(args)
464464

0 commit comments

Comments
 (0)