DISCLAIMER: Though Barney has by now reached a stage where it can be expected to be reasonably stable and complete, it is still under active development. If you run into bugs, missing features, or simply broken/outdated documentation please report those at https://github.com/NVIDIA/barney
Barney is a renderer that implements that ANARI Cross Platform Rendering API (https://www.khronos.org/anari/) primarily for NVIDIA OptiX and CUDA capable GPUs.
Barney is highly scalable, and can be used for both local, single-GPU rendering (as I do on my laptop on a daily basis), as well as for parallel rendering on multi-GPU nodes, and even for MPI based data-replicated and/or data-parallel rendering:
-
Single-GPU usage: For single GPU usage Barney works like any other ANARI device; multiple GPus or MPI are not required to run Barney.
-
Multi-GPU: Barney can also make use of more than one GPU. Barney can either be used in explicit multi-GPU mode (where the ANARI app explicitly creates different devices for different GPUs, and then "tethers" those using a specific ANARI extension we have introduced for this purpose); or it can simply use automatic multi-GPU, where Barney will simply grab all available GPUs and split the work across them.
-
Multi-Node: For cluster, cloud, or HPC environments Barney also supports MPI-parallel rendering (if built with MPI support), in which case an MPI-parallel application can use Barney across multiple different GPUs and/or nodes.
Barney also supports both data parallel as well as data replicated rendering: In fully data replicated rendering each GPU (and/or each node) gets the exact same copy of all the scene content, and different GPUs render different portions of the final image (ie, this should make rendering the same content faster). In fully data parallel (also sometimes called "distributed") rendering the scene to be rendered is "distributed" across the different GPUs/nodes, so different GPUs get different parts of what is logically a single model; then barney will make sure that each GPU "sees" all content during rendering (in this mode, Barney will not get faster by adding more GPUs, but it can render models much larger than what a single GPU could have rendered). Barney also supports some intermediate modes where, for example, different nodes work data parallel, but all GPUs on a given node work data replicated, etc.
Barney is primarily intended for the type and size of data one could encounter in a scientific visualization (sci-vis) content, when used from tools such as, for example, ParaView. Barney supports all the typical geometric types required by such applications (triangle meshes, spheres, cylinders, curves, etc), and also supports the typical scalar field/volume data types such as structured volumes as well as Block-Structured AMR and unstructured data (as far as these are currently supported by ANARI).
Though clearly focussed on Sci-Vis, Barney is still a pretty capable ray/path tracer on its own, and will, if scene and material data is properly set up, also do HRDI environment map lighting, indirect illumiation, glossy and specular reflection/refraction; depth-of-field cameras; point-, directional, and to some degree area lights; volumetric scattering, etc. Barney will clearly not achieve the kind of correctness or realism that pure global illumination renderers like Mitsuba or PBRT will be able to achieve; but it is still expected to behave creditably on typical non-Sci Vis rendering content.
Barney is not a stand-alone "renderer" or "vis-tool"; it is a library
with an API, and needs other applications to build towards it. As
such, it is never "run" on its own; it also needs to be run from another
application (e.g., hayStack, at http://github.org/ingowald/hayStack),
or from any application that supports the ANARI API (see https://www.khronos.org/anari/).
Barney is primarily intended for interactive (multi-)GPU rendering, but can also be built in a non-GPU configuration. Similarly, one of barney's most important features is MPI-based data parallel rendering, but can absolutely also be built---and used--without MPI. As such, dependencies depend on what exactly needs to get built:
One way or another, barney requires:
cmake, for building- a c++-20 compliant c++ compiler (gcc on linux, visual studio on windows, clang on mac)
For CUDA/OptiX Acceleration, it also requires:
-
CUDA, version 12 and up. -
OWL(https://github.com/NVIDIA/owl). Note OWL gets pulled in as a git submodule, no need to externally get and install. -
OptiX, as part of OWL. See documentation in OWL (https://github.com/NVIDIA/owl) for where to get, and how to best install for OWL to easily find it)
For MPI-based data-parallel rendering:
- Building requires a working MPI install. Running barney requires a CUDA-aware MPI, for building this should not matter. We typically develop under---and test with OpenMPI---4.1.6 or 5.0, but users have reported working with other MPI flavors such as MPICH.
The ANARI build of barney works in pretty much the same way (and with
the same options), but requires a pre-built and installed ANARI-SDK from
https://github.com/KhronosGroup/ANARI-SDK. As to the time of this writing,
you need ANARI SDK version 0.15 (or next_release branch)
First, build and install the ANARI SDK (https://github.com/KhronosGroup/ANARI-SDK):
cd ANARI-SDK
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<install-dir>
cmake --build .. [ --config Release ]
cmake --install .. [ --config Release ]Then, build barney, using same install dir:
cd barney
mkdir build
cd build
cmake .. -DCMAKE_INSTALL_PREFIX=<same-install-dir-as-anari> [options]
cmake --build .. [ --config Release ]
cmake --install .. [ --config Release ]By default Barney builds without MPI support; to enable this add
-DBARNEY_MPI=ON to the cmake config command.
Example: PBRT landscape in miniScene (http://github.com/ingowald/miniScene) format,
the 'embree headlight', and the TSDViewer 'TestORB'; all rendering
with HDRI env-map lighting.
Supporting most(all?) of the ANARI non-triangular geometry types: Here
various examples with capsules, spheres, cylinders, cones, and curves.

Structured Volume Data (float, uint8 and uint16 are supported,
and any volume can be distributed across different ranks by each rank
having different porions of that volume. Barney being intended for
sci-vis, every volume can have its own transfer function.
Here examples 'chest' and 'kingsnake' (both regular structuctured data, in different input scalar types), and on the right, 'scivis2011', a unstructured mesh volume type inside a semi-transparent triangular surface.
Though barney is not limited to ANARI (it is its own library, with
its own API), it will also, by default build a (by now reasonably
complete!) ANARI "device" that exposes most of barney's
functionality to applications using the ANARI API. If enabled in the
cmake build (it's on by default)---and properly installed via make install or cmake --install---this builds a implements an ANARI
device that any ANARI app can load as a ANARI device named
"barney". If barney is built with MPI support for MPI-based
data-parallel ray tracing it will also build a ANARI "barney_mpi"
device as well.
Note: To distinguish between the (general) ANARI API and the
specific barney-based implementation of this API we typically refer to
this implementation as the (B)ANARI device, or simply as banari.
-
dependencies:
libgtk-3-dev -
need to get, build, and install the ANARI-SDK:
https://github.com/KhronosGroup/ANARI-SDK. Note the SDK must be installed for barney to properly find it. -
build barney as described above.
BUILD_ANARIshould be on by default, so unless explictily disabled this should also build the banari device.
The barney devices should be easily usably by any existing ANARI
application by simply loading the "barney" device. For those apps
that respect the ANARI_LIBRARY environment variable convention you
sohld also be able to just set ANARI_LIBRARY=barney, and have the
app load the "default" device.
For data-parallel rendering across multiple collaborating ranks use
"barney_mpi"instead. Also see
https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://arxiv.org/abs/2407.00179&ved=2ahUKEwj2rPmKuqGMAxVLJUQIHVIvFSsQFnoECBoQAQ&usg=AOvVaw0z7wpXQQyZwSdPhd6effC8
for the conventions on how to properly use data-parallel ANARI (which
barney implements).
- major updated to MPI performance
- support for multi-GPU data parallel ANARI using device tethering
- more cuda-like kernel launches across both embree and CPU backends
- updates as required for ANARI 0.15
- support for CUDA 13
- performance fixes for threading on embree backend
- various fixes throughout
- various stability fixes and bug fixes in particular relating to materials, path tracing, and lighting, as well as on multi-device rendering
- closed various missing gaps wrt anari specs (missing formats, unsupported parameters, etc)
- major rework that allows 'rtcore' abstraction and multiple backends
- support both optix and embree (cpu only) backends
- completely reworked cmake build system (and in particular how stuff gets linked)
- reworked install/export/import system, exporting both
barneyandbarney_mpi; should always get used throughfind_package(barney)and then linked as import target(s)barney::barneyand (if found)barney::barney_mpi - anari device split into
anari_library_barneyandanari_library_barney_mpi - version used for pynari backend that works on all of linux/windows/mac and both cpu/gpu
Known limitations/issues:
- AMR support currently disabled
- umesh support only support cubql sampler
- updated to latest anari 0.12 SDK
- changed cuda arch detection to use 'native' by default, but allowing to override on cmdline.
- updated cuda arch detection in owl and cubql so as to allow barney to tell them which one to use (so it matches across all projects)
- reworked path tracing (and in particular MIS code) to (mostly-)pass a furnace test

