-
Notifications
You must be signed in to change notification settings - Fork 1.2k
KHR_gaussian_splatting #2490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
KHR_gaussian_splatting #2490
Conversation
Co-authored-by: Adam Morris <[email protected]>
Co-authored-by: Adam Morris <[email protected]>
…e extension rather than building on KHR_gaussian_splatting
…DME.md Co-authored-by: Sean Lilley <[email protected]>
…DME.md Co-authored-by: Sean Lilley <[email protected]>
…DME.md Co-authored-by: Sean Lilley <[email protected]>
…DME.md Co-authored-by: Sean Lilley <[email protected]>
Co-authored-by: Sean Lilley <[email protected]>
Co-authored-by: Sean Lilley <[email protected]>
|
It appears that this extension is tied to the Niantic Spatial library. Is it necessary to specify a version or other unique identifier to ensure that the desired algorithm and calling sequence is used? Also note that the license for the Niantic Spatial library is MIT. |
Good question. The SPZ library packs along a version number within the binary data that we store in the buffer, so it's unnecessary to have a version number stored in the glTF metadata. |
That algorithm compresses the [-3, 3] range to [-1, 1]. Do renderers need to scale those values back? |
@weegeekps There is no scaling back to [-3, 3] or [-5,5] range for this extensions (SPZ makes some assumptions and this is the reason for this scaling in this extension). In this extension, the values are as is e.g. found in an uncompressed PLY file. |
|
From this website: The chair.ply asset, the For the drums the ranges are [-2.654668, 8.410955] and [-1.314519, 0.995610] correspondingly. Just two random samples and please cross check, maybe with my quick test I did do something wrong. Also, by keeping the extension as is (maybe with some explanation on the normalized data), this allows better the possibility for other compression methods like SPZ and other upcoming compression methods. |
|
@weegeekps @lexaknyazev |
|
@weegeekps Same problem in regards of quantization and normalized data with |
|
@NorbertNopper-Huawei |
We can keep it if folks manage to train such data, however no magic conversion please. |
|
|
||
| ### Improving Fallback with COLOR_0 | ||
|
|
||
| To support better fallback functionality, the `COLOR_0` attribute semantic from the base glTF specification may be used to provide the diffuse color of the 3D Gaussian splat. This allows renderers to color the points in the sparse point cloud when 3D Gaussian splatting is not supported by a renderer. The value of `COLOR_0` is derived by multiplying the 3 diffuse color components of the 3D Gaussian splat with the constant zeroth-order spherical harmonic (ℓ = 0) for the RGB channels. The alpha channel should contain the opacity of the splat. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@weegeekps Plus adding offset=0.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an example of where you expect the 0.5f bias to be used in the renderer? I understand it's importance from the aspect of the training algorithm, but typically the renderer does not need to know this information, just that the color space is sRGB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: Did again some more research and had an internal discussion on this, so I keep what is important from the original post.
In general, the color is precomputed during load time, so one does not need to convert during runtime (but it is possible) or it is already stored in a specific file format e.g.
https://github.com/kishimisu/Gaussian-Splatting-WebGL/blob/main/src/loader.js#L96
https://github.com/playcanvas/engine/blob/main/src/scene/gsplat/gsplat-data.js#L73
https://github.com/antimatter15/splat/blob/main/convert.py#L32
In the implementation from the original paper this happens here:
https://github.com/graphdeco-inria/gaussian-splatting/blob/main/gaussian_renderer/__init__.py#L80
https://github.com/graphdeco-inria/gaussian-splatting/blob/main/utils/sh_utils.py#L117
During training, this goes forth and back e.g. using:
https://github.com/graphdeco-inria/gaussian-splatting/blob/main/utils/sh_utils.py#L114
The resulting colors are in general in the range of [0, 1], this is what the paper expects and is represented in this extension. So, this 0.5 is fixed by the given math and is added after the evaluation of the spherical harmonics. However, if it is not mandatory for 3DGS, a bias of 0.5 as default can be added.
For display referred images, the range of [0, 1] is given by default, independent if currently used sRGB encoding or for future use cases for PQ or HLG encoded images.
For scene referred images, if the linear color is in the [0, 1] range (e.g. content trained from SDR images and linear color space), this also works pretty well.
This fulfills the requirement for linear color space e.g. like here:
https://ubc-vision.github.io/stochasticsplats/
For the future, having other color spaces like Rec.2020, this also works well.
However, if the scene referred data is trained on HDR data, which contains values below 0 and/or larger 1, then it will not work.
However, the common approach is, that the color data is normalized to [0, 1] before training.
E.g. this paper uses RAW image data:
https://github.com/shreyesss/HDRSplat
To normalize the data, e.g. an offset with scale can be used. However, different approaches are available, to bring the data into the [0; 1] range e.g. using PQ for non-linear training.
This normalization and de-normalization and required parameters does belong into the KHR_gaussian_splatting_wide_gamut_color extension and needs to be further discussed there:
#2539
In summary:
- No
offsetandscaleor alternatives in this extension needed, as long as the trained image data are in the [0, 1] range. - As seen in the links above, the fixed
0.5is given by the implementations of the paper and also depends on the required math. If required, a defaultbiasof 0.5 can be added, however the text needs to be removed and/or rewritten in the README:
Wrong: "Non-normative Note: If the spherical harmonics are in the BT.709 gamut, the diffuse color can be computed from the KHR_gaussian_splatting:SH_DEGREE_0_COEF_0 attribute by multiplying each of the RGB components by the constant spherical harmonic value of 0.282095."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we guarantee that SH_DEGREE_0_COEF_0 * 0.282095 + 0.5 will always be in the [0, 1] range? If not, maybe the note should also require clamping so that COLOR_0 values are valid as per glTF 2.0 spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the initial color is loaded here:
https://github.com/playcanvas/supersplat/blob/0c0edf4d/src/shaders/splat-shader.ts#L105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, for the base glTF extension, we have this 0.5, like in the implementation of the paper.
Any additional parametrized offsetting and scaling, we do in the other extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computeColorFromSH
https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L20
Larger than degree 0
https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L32
Offset of 0.5
https://github.com/graphdeco-inria/diff-gaussian-rasterization/blob/main/cuda_rasterizer/forward.cu#L63
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a deeper look at what CesiumJS, BabylonJS, PlayCanvas, and Inria is doing because Inria is clearly are adding a 0.5f bias to the forward rendering pass and I wanted to ensure I fully understand. Ultimately, I think you are right @NorbertNopper-Huawei, but read on as I did a really deep dive into how all of this works. When I went back through everything this evening, as well as some of my notes from well over a year ago, I realized that this is actually a fairly important detail that I had forgotten.
The purpose of this bias within Inria's renderer is to ensure that the resulting color remains between [0, 1] before clamping. It's a numerical stability trick and as far as I can tell the authors mostly arbitrarily chose it because it works well enough.
As a refresher (because apparently I needed one) the way training works is the forward pass renders an image, compares to the ground truth, and then uses the backwards pass to adjust the SH coefficients. When you apply a bias, such as 0.5f as the authors have done, then the learned coefficients become intrinsically linked to the bias. Since a 3DGS renderer is simply the forward pass logic being run in real time, it needs to know the correct value of the bias.
None of the 3 real-time renderers appear to be doing this on the surface, but I did discover:
- PlayCanvas has it's own
clrOffsetandclrScalethat they're setting as you pointed out, but these are related to brightness and black point values set within the renderer. Otherwise, I couldn't find anywhere that the0.5fvalue was being set. I suspect it is getting handled elsewhere. - CesiumJS and BabylonJS implicitly get this
0.5fvalue through SPZ, which I clearly forgot about. 🤦♂️ - After looking deeper into the math, the
0.5fbias is appropriate for all SDR and HDR content. The key difference between SRGB and any HDR content would be that after the bias any data outside of the [0, 1] range will be clamped and truncated for SRGB and with HDR it would not.
Given all of this, we probably do need to add this bias value as a property, but while I'm leaning towards adding it, I admit I'm still not fully committed yet. I want to talk to some others on the training side of things and get their thoughts as well; in the off chance this is a value that nobody will ever change, then maybe it doesn't belong in the base spec. It is a non-negotiable constant that both the trainer and renderer have to agree on, so maybe it does. Not sure, I'll figure it out in the morning.
Also, as an aside, we should not refer to this value as offset. It's a fine detail, but in the context of Gaussian Splatting, something like colorBias or shBias is more correct given it's purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-read, as I learned as well and updated the document
#2490 (comment)
- For the base extension and probably other future extensions, we keep this
0.5and it is not the offset anymore I mentioned initially - For SDR(!) based content in scene referred space or display referred content in general, the values are between [0, 1]. So, this base extension works out of the box.
- For other color gamuts, this works as well with the base extension. However, these are handled in a separate extension as discussed
- For HDR content below 0 and/or above 1, the content is normalized before training, bringing into [0, 1] space (this is done in the HDRSPlat paper as far as I understand this). So, nothing needs to be changed in the trainer and/or the renderer for the general case. However afterwards, one could use an offset and scale to do the de-normalizing, but maybe others want to use PQ to do this. Or something different. This needs somehow defined in a separate extension and e.g. what PlayCanvas is doing with the clrOffset and clrScale is what I suggest in a general approach. And I assume as well, somewhere is the
0.5visible in their code.
- What is also important, that SH_DEGREE_0_COEF_0 * 0.282095 can be precomputed, however it is not "yet" the color.
- SH_DEGREE_0_COEF_0 * 0.282095 + 0.5 can also be precomputed for only degree 0. This is the color.
- However, both optimizations are up to other extensions etc. and/or the fallback mentioned in the README.
- In regards of the
0.5bias, the spherical harmonic coefficients are trained around 0 for numeric stability. If you have the degree 0 only case, the values are somehow around [-1.7, 1.7] as mentioned before in this discussion. If you multiply by 0.282095, you come around the [-0.5, 0.5]. Add 0.5, and one is in the color space. - For higher degrees, the intervals are different. However, if summed up properly and added 0.5 at the end(!), the range is again in [0.0, 1.0]
- So no
biasrequired, it is always0.5. And if this really needs to be altered in the future, we can create another extension for this.
From my perspective, if folks are catching up with our insights, the base extension is ready to go. Everything else can be handled in the other extensions.
and
I just added some log output for the SH0/0 "bounds" in my viewer, and dragged-and-dropped the given files into that. The values for those match the values that you posted. I also dropped a few other, random, unspecified PLY files into that. What may be important: Theye PLY files are from different sources! (Random websearch results...). It might very well be that one particular splat creator/trainer does some form of "normalization" or targets a certain value range. But this wouldn't necessarily mean that this has to be the case. Some outputs for different files: It looks like there is a cluster at that that ~1.7 range, but the values may be way larger, and apparently, still make sense. So I assume that the trainer could be configured to not generate this [-1.7,1.7] (or larger) range, but in fact, the [-1,1] range instead, and still represent "the same data". In this case, the "signed intN/normalized" accessors could be used as a compact (quantized) representation of these values. So it looks like it would make sense to allow this in the base specification - maybe with a hint that ~"some trainers could be configured to generate that kind of data". |
|
@javagl Thanks for cross checking. So, there is no problem to keep the quantized definition. If the data can be generated, all good. However, no magic numbers for conversions please. My biggest concern as now is the 0.5 in the color conversion, as for linear data and/or encodings this will not work. So, to be future proof, this 0.5 should be a parameter inside the glTF. |
|
@NorbertNopper-Huawei Where does the 0.5 offset come from? |
|
I'm not entirely sure about the conversion from/to In my My "conversion" would just map these values to [0,1] (clamping them). Out of curiosity, I applied this back-and-forth conversion (with clamping) to the A subtle difference is visible (although not as strong as I suspected). I think that the question is still the same: What is the value range? If it was [-1.7,1.7], then the default conversion involving that 0.5 would make sense. If it was [0,3.4] then a different conversion (without some 0.5) would have to be applied. And an "offset=0.5" is not sufficient, because the range might also be [-10,10] or [0,20], for what it's worth... |
I assume, the original paper used mainly display encoded sRGB images. So, to map the range to RGB, this 0.5 was added: However, if linear values are used, which are beyond 1 and/or different encodings are used e.g. HLG or PQ, I assume a different offset makes sense. You can also cross check code, if PLY data is precomputed to color data. |
As elaboarted above: This assumes a range of [-1.7,1.7], which may not be the truth. Actually, I'm not sure how important that is. One could just claim that the generator/trainer is responsible for storing values (in [0,1]) in the |
Please forget about this. |
This contradicts your comment
But... I'll abstain from further comments here for now. |
|
@javagl Please do not get me wrong. It is not possible to request this from the data. So, sorry, did not want to be impolite. |
Yes, and my apologies for glossing over that. I was trying to make a point that there are algorithms that absolutely work with as it is today. During the normalization step, the denominator is the absolute maximum value of your range of possible numbers. So, if you know your range is going to be In practice you shouldn't use a pure SNORM8 quantization algorithm like this, but you can and it will work. Excluding the DC components, it's relatively safe to clamp the outliers when dealing with the spherical harmonics. Doing so will reduce specular effects but if you choose a slightly higher range such as All of that said, you should generally not use this algorithm. Which leads me to the point that I think I failed catastrophically hard at actually making yesterday: We want to allow compression and quantization extensions, but we do not want to handle compression in the base extension. We've stated multiple times over the past few weeks that compression is not a concern of the base extension, so perhaps this is another bit of scope that has creeped in that should be removed. If we include support for quantizing the spherical harmonics, we need to outline exactly how we expect it to be performed, and I'm not completely convinced that doing so is smart given the future-proofing goal we're trying to achieve with the base extension. As an example of how this is not a straightforward decision: storing a This exercise has made me strongly in favor of removing the quantization aspect entirely unless someone has a very good argument as to why we need to keep it. |
As a general rule: the base glTF spec supports "basic" quantization, that is values in [0, 1] or [-1, 1] ranges with 8- or 16-bit precision. If expected values for a particular attribute fit within those ranges, then we can support quantization for that attribute essentially for free. If they don't fit, it would be better to require floating-point values for that attribute in the base extension. This exercise needs to be done separately for every attribute. |
|
As has been mentioned before, it's also possible to use non-normalized integers (that would be implicitly converted to the corresponding floats at runtime) when that makes sense. |
If I am understanding you correctly, you'd still need a fixed scale in that case, so I don't think it makes sense for the spherical harmonics. |
Rotation, scale, and opacity fit within those ranges. Technically, the degree-0 spherical harmonics should too, but I think I will go for consistency here and just make all of the spherical harmonics floats. |
|
Highly preliminary test data: SplatGltfTests-SNAPSHOT-2025-12-12.zip The archive contains three glTF files that use the
|
|
The To some extent, this was intentional. Renderers could make a ton of assumptions about the structure of the glTF. It is by no means self-evident that a renderer could cope with a splat primitive/mesh that is attached to two nodes, with different transforms. (I'd say that it is self-evident, but others have different priorities). Given that all splats from a glTF will have to be sorted, globally (!) at some point, the renderer will have to shove these splats into a single collection, and sort this. Yes, in theory, other solutions could be possible. This would boil down to storing (only) the sorting order of the splats, once for each instantiation of this splat list, and then use this "list of sorting orders" to compute the final, blended result of all splat primitive instantiations. This computation of the sorting orders would have to take into account the whole global matrices of the nodes that the meshes are attached to. Depending on how common this case is assumed to be, the resulting complexity and implementation effort may not be warranted. And... let's he honest: The case where a glTF contains more than one splat primitive will be very rare. The case that a glTF contains the same splat primitive, attached to different nodes, will probably never occur in practice. It will only occur in artificial test cases. Like this one: _gltf_with_grid_instanced 2025-12-13.zip It is an "optimized" version of the |
We need to revisit this, as we need to clarify, what the impact of a scene referred and a display referred workflow for glTF means. |





Update: I've split out KHR_gaussian_splatting_spz_2 into it's own PR. There's been many changes since I wrote the summary, and while I'm leaving it at the moment. It will be updated soon.
This extension proposal,
KHR_spz_gaussian_splats_compression, allows for efficient storage of 3D Gaussian splats data within glTF using the SPZ compression library from Niantic Spatial. The extension is applied to a primitive. The SPZ binary blob is stored as a buffer within the glTF file, and implementations can use the SPZ library to either decompress and then map the compressed Gaussians into placeholder attributes on the primitive or directly decompress into their rendering pipeline if preferred. Content creators have the flexibility to choose to use no Spherical Harmonics, or up to all 3 degrees of spherical harmonics depending on their use case.We are currently working on an implementation in the CesiumJS engine based on this draft that we hope to have released soon.