clean up snorm to float definition to use divide (like in its example) not multiply by reciprocal

amarpMSFT · amarpMSFT · commit f37b0d6033a1 · 2026-04-13T09:40:24.000-07:00
diff --git a/d3d/archive/D3D11_3_FunctionalSpec.htm b/d3d/archive/D3D11_3_FunctionalSpec.htm
@@ -699,7 +699,7 @@ <H4>3.2.3.3 SNORM -&gt; FLOAT</H4>
 conversion to floating-point is as follows:</p>
 <ul>
 <li>The most-negative value maps to -1.0f.  e.g. the 5-bit value 10000 maps to -1.0f.
-<li>Every other value is converted to a float (call it c), and then result = c * (1.0f / (2<sup>(n-1)</sup>-1)).
+<li>Every other value is converted to a float (call it c), and then result = c / (2<sup>(n-1)</sup>-1).
 e.g. the 5-bit value 10001 is converted to -15.0f, and then divided by 15.0f, yielding
 -1.0f.</li>
 </ul>
diff --git a/d3d/archive/images/d3d11/D3D11_3_FunctionalSpec.htm b/d3d/archive/images/d3d11/D3D11_3_FunctionalSpec.htm
@@ -352,6 +352,7 @@ <h4 id="754Rules">Partial Listing of Honored IEEE-754 Rules</h4>
 <li>Comparisons ignore the sign of 0 (so +0 equals -0).</li>
 <LI>The comparison NE, when either or both operands is NaN returns TRUE. </li>
 <LI>Comparisons of any non-NaN value against +/- INF return the correct result. </li>
+<li>min(x,NaN) == min(NaN,x) == x (same for max).  This used to be a deviation from IEEE 754 but now aligns with IEEE-754-2019's minimumNumber and maximumNumber operations.</li>
 </ul>
 <h4 id="FP32SpecialCases">Complete Listing of Deviations or Additional Requirements vs. IEEE-754</h4>
 <ul>
@@ -370,20 +371,9 @@ <h4 id="FP32SpecialCases">Complete Listing of Deviations or Additional Requireme
 <li>NaN input to an operation obviously always produces NaN on output, however the exact bit pattern
 of the NaN is not required to stay the same (unless the operation is a raw "mov" instruction which
 does not alter data at all.)</li>
-<p>The IEEE-754R specification for floating point min and max operations states that if one of the inputs to min or max is a
-"quiet" NaN, then the result of the operation is the other parameter. For example:</p>
-<p>min(x,QNaN) == min(QNaN,x) == x  (same for max)  </p>
-<p>A recent revision of the IEEE-754R specification seems to have adopted a different behavior
-for min and max when one input is a "signaling" SNaN value vs if it was QNaN: </p>
-<p>min(x,SNaN) == min(SNaN,x) == QNaN (same for max)</p>
-<p>This latter change was not in place until after D3D10 had shipped, and even after the D3D11 specifications had become fairly mature and locked down.
-So, even though the intent in general for D3D is to follow the standards for arithmetic: IEEE-754 and IEEE-754R, in this case there is a deviation.
-Future D3D versions may consider relaxing the rules allow either behavior, although compatibility will be a concern in addition having to
-justify the value of distinguishing QNaN vs SNaN in general.  As for D3D11, it cannot change behavior here at this point, so it matches D3D10 as follows:</p>
-<p>The arithmetic rules in D3D10+ do not make any distinctions between "quiet" and "signaling" NaN values (QNaN vs SNaN).  All "NaN" values are handled the same way.
-In the case of min() and max(), the D3D behavior for any NaN value is like how QNaN is handled in IEEE-754R definition above.
-(For completeness - if both inputs are NaN, any NaN value is returned.)</p>
-<li>Another new IEEE 754R rule is that min(-0,+0) == min(+0,-0) == -0,
+<li>The arithmetic rules in D3D10+ do not make any distinctions between "quiet" and "signaling" NaN values (QNaN vs SNaN).  All "NaN" values are handled the same way.</li>
+<li>If both inputs to min() or max() are NaN, any NaN is returned.</li>
+<li>A IEEE 754R rule is that min(-0,+0) == min(+0,-0) == -0,
 and max(-0,+0) == max(+0,-0) == +0, which honor the sign, in contrast
 to the comparison rules for signed zero (stated above).  D3D11 recommends the
 IEEE 754R behavior here, but it will not be enforced; it is permissible
@@ -585,7 +575,7 @@ <h4 id="SNORMtoFLOAT">SNORM -&gt; FLOAT</h4>
 conversion to floating-point is as follows:</p>
 <ul>
 <li>The most-negative value maps to -1.0f.  e.g. the 5-bit value 10000 maps to -1.0f.
-<li>Every other value is converted to a float (call it c), and then result = c * (1.0f / (2<sup>(n-1)</sup>-1)).
+<li>Every other value is converted to a float (call it c), and then result = c / (2<sup>(n-1)</sup>-1).
 e.g. the 5-bit value 10001 is converted to -15.0f, and then divided by 15.0f, yielding
 -1.0f.</li>
 </ul>
@@ -4571,6 +4561,7 @@ <h5>Buffer Tiling</h5>
 <p>A Buffer Resource is trivially divided into 64KB tiles, with some empty space in the last tile if the size is not a multiple of 64KB.</p>
 <p>Structured Buffers must have no constraint on the Stride to be Tiled, however possible performance optimizations in hardware
 for using Structured Buffers may be sacrificed by making them Tiled in the first place.</p>
+<p>Typed buffer views on a tiled resource don't support 96bpp formats, video formats, R1_UNORM, R8G8_B8G8_UNORM, G8R8_G8B8_UNORM.</p>
 
 <hr><!-- ********************************************************************** -->
 <h5 id="MipmapPacking">Mipmap Packing</h5>
@@ -13003,46 +12994,44 @@ <h3 id="ConservativeoDepth">Conservative Output Depth (Conservative oDepth)</h3>
 This enables early depth culling and depth modification to be used together.</p>
 <!--REM-->
 <p>Enabling oDepth in a pixel shader disables early z culling.  Early depth culling dramatically improves performance when there is medium to significant overdraw.
-Rather than having the pixel shader arbitrarily change the depth value, the shader could provide information on whether the output depth value is always less than
-or greater than the rasterizer depth value.  In addition to providing the information of that oDepth is always "greater  or equal to" or "less or equal to" the
-rasterizer depth, the shader compiler adds instructions to the shader to guarantee the direction indicated.  This allows the depth value to be affected by the
+Rather than having the pixel shader arbitrarily change the depth value, the shader can make a promise that the output depth value is always less than
+or greater than the rasterizer depth value.  This allows the depth value to be affected by the
 shader and allows early depth culling when the declared conservative depth mode and depth comparison mode are compatible.</p>
 <!--/REM-->
 
 <p>If a Shader intends to use conservative depth writes, it must be <a href="#inst_ConservativeoDepthDCL">declared</a> statically in the Shader with parameters
 <a href="#interpretedvalue_DEPTH_GREATER_EQUAL">SV_DepthGreaterEqual</a> or <a href="#interpretedvalue_DEPTH_LESS_EQUAL">SV_DepthLessEqual</a>.
-If the shader chooses SV_DepthGreaterEqual or SV_DepthLessEqual, then a guarantee is made that the shader never
-writes smaller or larger values (respectively) than the rasterizer depth value by inserting instructions that either max or min the desired output depth
-value with the rasterizer depth.  If the desired output value would be in violation of the defined conservative depth type, then the rasterizer depth is used.</p>
+If the shader chooses SV_DepthGreaterEqual or SV_DepthLessEqual, then a the shader promises that it never writes smaller or larger values (respectively) than the 
+rasterizer depth value.  Breaking this promise results in undefined behavior.</p>
 
 <p>The valid range is indentical to that for standard oDepth.</p>
 
 <h4>Implementation:</h4>
 <p><a href="#interpretedvalue_DEPTH_GREATER_EQUAL">SV_DepthGreaterEqual:</a></p>
-<p>If the shader declares the depth output as SV_DepthGreaterEqual, then an extra max instruction is added to the end of the shader program.</p>
-<pre>oDepth = max(DepthGreaterEqualValue, RasterizerDepthValue);</pre>
-<p>This instruction enforces the guarantee that the output depth value of the pixel shader is greater than or equal to the rasterizer depth value.
-Now that the value is known to be equal to or behind the depth values defined by the primitive, then early depth cull can be enabled when the depth
+<p>If the shader declares the depth output as SV_DepthGreaterEqual, the system assumes it can enable early depth cull when the depth
 comparison mode is "less" or "less or equal".</p>
 
 <p><a href="#interpretedvalue_DEPTH_LESS_EQUAL">SV_DepthLessEqual:</a></p>
-<p>If the shader declares the depth output as SV_DepthLessEqual, then an extra min instruction is added to the end of the shader program.</p>
-<pre>oDepth = min(DepthLessEqualValue, RasterizerDepthValue);</pre>
-<p>This instruction enforces the guarantee that the output depth value of the pixel shader is less than or equal to the rasterizer depth value.
-Now that the value is known to be equal to or in front of the depth values defined by the primitive, then early depth cull can be enabled when the
-depth comparison mode is "greater" or "greater or equal".</p>
+<p>If the shader declares the depth output as SV_DepthLessEqual, the system assumes it can enable early depth cull when the depth
+comparison mode is "greater" or "greater or equal".</p>
 
 <p>Using SV_DepthGreaterEqual and SV_DepthLessEqual is valid with any depth mode, but the early depth cull will be disabled if the knowledge of is
-GreaterEqual/LessEqual  is not compatible with the early depth cull optimization. The min/max test against the rasterizer depth always occurs, but the benefits
-of the guarantee are only useful with the correct depth test mode.
+GreaterEqual/LessEqual  is not compatible with the early depth cull optimization. 
 </p>
 
-<h4>Rasterizer Depth Value Used in Clamp</h4>
-<p>For either clamp described above, RasterizerDepthValue is the centroid depth value if the shader is executing at pixel-frequency.
-It is enforced by the HLSL compiler that if the shader inputs depth and outputs one of the above clamped depth values,
-the input depth must be interpolated as linear_noperspective_centroid in pixel-frequency execution (if position is input at all).
-If the shader does not input position, for pixel-frequency execution the centroid depth is used for conservative depth clamping,
-and for sample-frequency execution the per-sample depth is used for per-sample conservative depth clamping.</p>
+<h4>Rasterizer Depth Value Implementations May Use to Clamp Conservative Depth</h4>
+<p>Implementations may choose to pick a particular behavior when the app breaks the promises described above by clamping to the 
+rasterizer depth.  The rasterizer depth value that would be used to clamp against is the centroid depth value if the shader is executing at pixel-frequency
+or sample depth at sample-frequency.
+To help enable this, it is enforced by the HLSL compiler that if the shader inputs depth and outputs one of the above depth values,
+the input depth must be interpolated as linear_noperspective_centroid or linear_noperspective_sample (if position is input at all).</p>
+
+<!--REM-->
+<p>This clamping was originally intended to be performed by either the HLSL compiler or by the driver (spec was ambiguous), avoiding undefined behavior.
+But it appears tests weren't authored to verify clamping, so it turns out the compiler and many implementations haven't clamping for years (and it isn't worth starting to clamp now that this was noticed in 2025).
+
+The compiler enforcement of interpolation mode described here was always present on the assumption that clamping is happening, and it isn't being removed.</p>
+<!--/REM-->
 
 <p>The purpose for requiring centroid in pixel-frequency execution is that it guarantees the clamp is done against a safe depth value
 within the gamut of the covered samples, thus not violating any traditional depth optimizations.  More ideal would have been to
@@ -13339,7 +13328,7 @@ <h2 id="BlendingPrecision">Blending Precision</h2>
 <p>Note that this clamping must be done on a per-rendertarget basis,
 so if one render target is a float type and another is UNORM type, the shader values and blend factor must be
 float range for the float render target Blend, and clamped to 0..1 for the UNORM render target Blend.</p>
-<p>An exception is float16, float11 or float10 RenderTargets, where it is permitted
+<p>An exception is float16, float11, float10 or R9G9B9E5 RenderTargets, where it is permitted
 for implementations to not clamp data going into the blend.  So it is required that blend operations on these formats to be
 be done with at least equal precision/range as the output format but an implementation can choose to perform blending with
 precision/range (up to float32).</p>
@@ -27613,7 +27602,6 @@ <h1> Change History </h1>
                     "Driver Instrumentation" section and replaced it with
                     "High Performance Timing Data".  Just merged this Windows 8.1 feature
                     into this spec (IHVs already know about it so nothing new here)</a>.
--------------------------------v1.15 (above this line) posted-----------------------
 2014/12/11 rschmitt: - Updated RenderTargetView behavior for reading null tiles: it
                        must match UAV behavior for reading null tiles, and that wasn't
                        clear before. Specifically, reads from RenderTargetViews return
@@ -27630,6 +27618,12 @@ <h1> Change History </h1>
                     to a Texture resource or vice versa, all original bindings
                     must first be set to NULL by the application before defining
                     mappings for the new resource.
+2025/6/20 amarp: - Conservative Depth (SV_DepthGreaterEqual or SV_DepthLessEqual)
+                   had defined clamping if the shader broke its promise.  But
+                   this wasn't tested, so the compiler didn't do the clamping. 
+                   Changed the behavior to undefined if the app breaks the promise.
+2026/1/28 amarp: - Typed buffer views of tiled buffers don't support 96bpp formats, video formats, 
+                   R1_UNORM, R8G8_B8G8_UNORM, G8R8_G8B8_UNORM.
 
 </pre>
 <!--/INT-->