Skip to content

Commit f37b0d6

Browse files
committed
clean up snorm to float definition to use divide (like in its example) not multiply by reciprocal
1 parent b4510d4 commit f37b0d6

2 files changed

Lines changed: 35 additions & 41 deletions

File tree

d3d/archive/D3D11_3_FunctionalSpec.htm

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -699,7 +699,7 @@ <H4>3.2.3.3 SNORM -&gt; FLOAT</H4>
699699
conversion to floating-point is as follows:</p>
700700
<ul>
701701
<li>The most-negative value maps to -1.0f. e.g. the 5-bit value 10000 maps to -1.0f.
702-
<li>Every other value is converted to a float (call it c), and then result = c * (1.0f / (2<sup>(n-1)</sup>-1)).
702+
<li>Every other value is converted to a float (call it c), and then result = c / (2<sup>(n-1)</sup>-1).
703703
e.g. the 5-bit value 10001 is converted to -15.0f, and then divided by 15.0f, yielding
704704
-1.0f.</li>
705705
</ul>

d3d/archive/images/d3d11/D3D11_3_FunctionalSpec.htm

Lines changed: 34 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,7 @@ <h4 id="754Rules">Partial Listing of Honored IEEE-754 Rules</h4>
352352
<li>Comparisons ignore the sign of 0 (so +0 equals -0).</li>
353353
<LI>The comparison NE, when either or both operands is NaN returns TRUE. </li>
354354
<LI>Comparisons of any non-NaN value against +/- INF return the correct result. </li>
355+
<li>min(x,NaN) == min(NaN,x) == x (same for max). This used to be a deviation from IEEE 754 but now aligns with IEEE-754-2019's minimumNumber and maximumNumber operations.</li>
355356
</ul>
356357
<h4 id="FP32SpecialCases">Complete Listing of Deviations or Additional Requirements vs. IEEE-754</h4>
357358
<ul>
@@ -370,20 +371,9 @@ <h4 id="FP32SpecialCases">Complete Listing of Deviations or Additional Requireme
370371
<li>NaN input to an operation obviously always produces NaN on output, however the exact bit pattern
371372
of the NaN is not required to stay the same (unless the operation is a raw "mov" instruction which
372373
does not alter data at all.)</li>
373-
<p>The IEEE-754R specification for floating point min and max operations states that if one of the inputs to min or max is a
374-
"quiet" NaN, then the result of the operation is the other parameter. For example:</p>
375-
<p>min(x,QNaN) == min(QNaN,x) == x (same for max) </p>
376-
<p>A recent revision of the IEEE-754R specification seems to have adopted a different behavior
377-
for min and max when one input is a "signaling" SNaN value vs if it was QNaN: </p>
378-
<p>min(x,SNaN) == min(SNaN,x) == QNaN (same for max)</p>
379-
<p>This latter change was not in place until after D3D10 had shipped, and even after the D3D11 specifications had become fairly mature and locked down.
380-
So, even though the intent in general for D3D is to follow the standards for arithmetic: IEEE-754 and IEEE-754R, in this case there is a deviation.
381-
Future D3D versions may consider relaxing the rules allow either behavior, although compatibility will be a concern in addition having to
382-
justify the value of distinguishing QNaN vs SNaN in general. As for D3D11, it cannot change behavior here at this point, so it matches D3D10 as follows:</p>
383-
<p>The arithmetic rules in D3D10+ do not make any distinctions between "quiet" and "signaling" NaN values (QNaN vs SNaN). All "NaN" values are handled the same way.
384-
In the case of min() and max(), the D3D behavior for any NaN value is like how QNaN is handled in IEEE-754R definition above.
385-
(For completeness - if both inputs are NaN, any NaN value is returned.)</p>
386-
<li>Another new IEEE 754R rule is that min(-0,+0) == min(+0,-0) == -0,
374+
<li>The arithmetic rules in D3D10+ do not make any distinctions between "quiet" and "signaling" NaN values (QNaN vs SNaN). All "NaN" values are handled the same way.</li>
375+
<li>If both inputs to min() or max() are NaN, any NaN is returned.</li>
376+
<li>A IEEE 754R rule is that min(-0,+0) == min(+0,-0) == -0,
387377
and max(-0,+0) == max(+0,-0) == +0, which honor the sign, in contrast
388378
to the comparison rules for signed zero (stated above). D3D11 recommends the
389379
IEEE 754R behavior here, but it will not be enforced; it is permissible
@@ -585,7 +575,7 @@ <h4 id="SNORMtoFLOAT">SNORM -&gt; FLOAT</h4>
585575
conversion to floating-point is as follows:</p>
586576
<ul>
587577
<li>The most-negative value maps to -1.0f. e.g. the 5-bit value 10000 maps to -1.0f.
588-
<li>Every other value is converted to a float (call it c), and then result = c * (1.0f / (2<sup>(n-1)</sup>-1)).
578+
<li>Every other value is converted to a float (call it c), and then result = c / (2<sup>(n-1)</sup>-1).
589579
e.g. the 5-bit value 10001 is converted to -15.0f, and then divided by 15.0f, yielding
590580
-1.0f.</li>
591581
</ul>
@@ -4571,6 +4561,7 @@ <h5>Buffer Tiling</h5>
45714561
<p>A Buffer Resource is trivially divided into 64KB tiles, with some empty space in the last tile if the size is not a multiple of 64KB.</p>
45724562
<p>Structured Buffers must have no constraint on the Stride to be Tiled, however possible performance optimizations in hardware
45734563
for using Structured Buffers may be sacrificed by making them Tiled in the first place.</p>
4564+
<p>Typed buffer views on a tiled resource don't support 96bpp formats, video formats, R1_UNORM, R8G8_B8G8_UNORM, G8R8_G8B8_UNORM.</p>
45744565

45754566
<hr><!-- ********************************************************************** -->
45764567
<h5 id="MipmapPacking">Mipmap Packing</h5>
@@ -13003,46 +12994,44 @@ <h3 id="ConservativeoDepth">Conservative Output Depth (Conservative oDepth)</h3>
1300312994
This enables early depth culling and depth modification to be used together.</p>
1300412995
<!--REM-->
1300512996
<p>Enabling oDepth in a pixel shader disables early z culling. Early depth culling dramatically improves performance when there is medium to significant overdraw.
13006-
Rather than having the pixel shader arbitrarily change the depth value, the shader could provide information on whether the output depth value is always less than
13007-
or greater than the rasterizer depth value. In addition to providing the information of that oDepth is always "greater or equal to" or "less or equal to" the
13008-
rasterizer depth, the shader compiler adds instructions to the shader to guarantee the direction indicated. This allows the depth value to be affected by the
12997+
Rather than having the pixel shader arbitrarily change the depth value, the shader can make a promise that the output depth value is always less than
12998+
or greater than the rasterizer depth value. This allows the depth value to be affected by the
1300912999
shader and allows early depth culling when the declared conservative depth mode and depth comparison mode are compatible.</p>
1301013000
<!--/REM-->
1301113001

1301213002
<p>If a Shader intends to use conservative depth writes, it must be <a href="#inst_ConservativeoDepthDCL">declared</a> statically in the Shader with parameters
1301313003
<a href="#interpretedvalue_DEPTH_GREATER_EQUAL">SV_DepthGreaterEqual</a> or <a href="#interpretedvalue_DEPTH_LESS_EQUAL">SV_DepthLessEqual</a>.
13014-
If the shader chooses SV_DepthGreaterEqual or SV_DepthLessEqual, then a guarantee is made that the shader never
13015-
writes smaller or larger values (respectively) than the rasterizer depth value by inserting instructions that either max or min the desired output depth
13016-
value with the rasterizer depth. If the desired output value would be in violation of the defined conservative depth type, then the rasterizer depth is used.</p>
13004+
If the shader chooses SV_DepthGreaterEqual or SV_DepthLessEqual, then a the shader promises that it never writes smaller or larger values (respectively) than the
13005+
rasterizer depth value. Breaking this promise results in undefined behavior.</p>
1301713006

1301813007
<p>The valid range is indentical to that for standard oDepth.</p>
1301913008

1302013009
<h4>Implementation:</h4>
1302113010
<p><a href="#interpretedvalue_DEPTH_GREATER_EQUAL">SV_DepthGreaterEqual:</a></p>
13022-
<p>If the shader declares the depth output as SV_DepthGreaterEqual, then an extra max instruction is added to the end of the shader program.</p>
13023-
<pre>oDepth = max(DepthGreaterEqualValue, RasterizerDepthValue);</pre>
13024-
<p>This instruction enforces the guarantee that the output depth value of the pixel shader is greater than or equal to the rasterizer depth value.
13025-
Now that the value is known to be equal to or behind the depth values defined by the primitive, then early depth cull can be enabled when the depth
13011+
<p>If the shader declares the depth output as SV_DepthGreaterEqual, the system assumes it can enable early depth cull when the depth
1302613012
comparison mode is "less" or "less or equal".</p>
1302713013

1302813014
<p><a href="#interpretedvalue_DEPTH_LESS_EQUAL">SV_DepthLessEqual:</a></p>
13029-
<p>If the shader declares the depth output as SV_DepthLessEqual, then an extra min instruction is added to the end of the shader program.</p>
13030-
<pre>oDepth = min(DepthLessEqualValue, RasterizerDepthValue);</pre>
13031-
<p>This instruction enforces the guarantee that the output depth value of the pixel shader is less than or equal to the rasterizer depth value.
13032-
Now that the value is known to be equal to or in front of the depth values defined by the primitive, then early depth cull can be enabled when the
13033-
depth comparison mode is "greater" or "greater or equal".</p>
13015+
<p>If the shader declares the depth output as SV_DepthLessEqual, the system assumes it can enable early depth cull when the depth
13016+
comparison mode is "greater" or "greater or equal".</p>
1303413017

1303513018
<p>Using SV_DepthGreaterEqual and SV_DepthLessEqual is valid with any depth mode, but the early depth cull will be disabled if the knowledge of is
13036-
GreaterEqual/LessEqual is not compatible with the early depth cull optimization. The min/max test against the rasterizer depth always occurs, but the benefits
13037-
of the guarantee are only useful with the correct depth test mode.
13019+
GreaterEqual/LessEqual is not compatible with the early depth cull optimization.
1303813020
</p>
1303913021

13040-
<h4>Rasterizer Depth Value Used in Clamp</h4>
13041-
<p>For either clamp described above, RasterizerDepthValue is the centroid depth value if the shader is executing at pixel-frequency.
13042-
It is enforced by the HLSL compiler that if the shader inputs depth and outputs one of the above clamped depth values,
13043-
the input depth must be interpolated as linear_noperspective_centroid in pixel-frequency execution (if position is input at all).
13044-
If the shader does not input position, for pixel-frequency execution the centroid depth is used for conservative depth clamping,
13045-
and for sample-frequency execution the per-sample depth is used for per-sample conservative depth clamping.</p>
13022+
<h4>Rasterizer Depth Value Implementations May Use to Clamp Conservative Depth</h4>
13023+
<p>Implementations may choose to pick a particular behavior when the app breaks the promises described above by clamping to the
13024+
rasterizer depth. The rasterizer depth value that would be used to clamp against is the centroid depth value if the shader is executing at pixel-frequency
13025+
or sample depth at sample-frequency.
13026+
To help enable this, it is enforced by the HLSL compiler that if the shader inputs depth and outputs one of the above depth values,
13027+
the input depth must be interpolated as linear_noperspective_centroid or linear_noperspective_sample (if position is input at all).</p>
13028+
13029+
<!--REM-->
13030+
<p>This clamping was originally intended to be performed by either the HLSL compiler or by the driver (spec was ambiguous), avoiding undefined behavior.
13031+
But it appears tests weren't authored to verify clamping, so it turns out the compiler and many implementations haven't clamping for years (and it isn't worth starting to clamp now that this was noticed in 2025).
13032+
13033+
The compiler enforcement of interpolation mode described here was always present on the assumption that clamping is happening, and it isn't being removed.</p>
13034+
<!--/REM-->
1304613035

1304713036
<p>The purpose for requiring centroid in pixel-frequency execution is that it guarantees the clamp is done against a safe depth value
1304813037
within the gamut of the covered samples, thus not violating any traditional depth optimizations. More ideal would have been to
@@ -13339,7 +13328,7 @@ <h2 id="BlendingPrecision">Blending Precision</h2>
1333913328
<p>Note that this clamping must be done on a per-rendertarget basis,
1334013329
so if one render target is a float type and another is UNORM type, the shader values and blend factor must be
1334113330
float range for the float render target Blend, and clamped to 0..1 for the UNORM render target Blend.</p>
13342-
<p>An exception is float16, float11 or float10 RenderTargets, where it is permitted
13331+
<p>An exception is float16, float11, float10 or R9G9B9E5 RenderTargets, where it is permitted
1334313332
for implementations to not clamp data going into the blend. So it is required that blend operations on these formats to be
1334413333
be done with at least equal precision/range as the output format but an implementation can choose to perform blending with
1334513334
precision/range (up to float32).</p>
@@ -27613,7 +27602,6 @@ <h1> Change History </h1>
2761327602
"Driver Instrumentation" section and replaced it with
2761427603
"High Performance Timing Data". Just merged this Windows 8.1 feature
2761527604
into this spec (IHVs already know about it so nothing new here)</a>.
27616-
-------------------------------v1.15 (above this line) posted-----------------------
2761727605
2014/12/11 rschmitt: - Updated RenderTargetView behavior for reading null tiles: it
2761827606
must match UAV behavior for reading null tiles, and that wasn't
2761927607
clear before. Specifically, reads from RenderTargetViews return
@@ -27630,6 +27618,12 @@ <h1> Change History </h1>
2763027618
to a Texture resource or vice versa, all original bindings
2763127619
must first be set to NULL by the application before defining
2763227620
mappings for the new resource.
27621+
2025/6/20 amarp: - Conservative Depth (SV_DepthGreaterEqual or SV_DepthLessEqual)
27622+
had defined clamping if the shader broke its promise. But
27623+
this wasn't tested, so the compiler didn't do the clamping.
27624+
Changed the behavior to undefined if the app breaks the promise.
27625+
2026/1/28 amarp: - Typed buffer views of tiled buffers don't support 96bpp formats, video formats,
27626+
R1_UNORM, R8G8_B8G8_UNORM, G8R8_G8B8_UNORM.
2763327627

2763427628
</pre>
2763527629
<!--/INT-->

0 commit comments

Comments
 (0)