-
Notifications
You must be signed in to change notification settings - Fork 23
Finish removing the BigInts from * for FD{Int128}! #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NHDaly
wants to merge
33
commits into
master
Choose a base branch
from
nhd-Int128-fastmul-noallocs
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+303
−17
Open
Changes from 14 commits
Commits
Show all changes
33 commits
Select commit
Hold shift + click to select a range
a1c1711
Use Int256 to avoid BigInt in FD operations.
NHDaly 7756238
Further reduce BigInts by skipping a `rem()` in iseven
NHDaly 78e45dc
Fix ambiguity in _widemul(Int256, UInt256)
NHDaly 879c602
Bump patch version number
NHDaly a245651
Add compat for BitIntegers
NHDaly 4bd8c45
Finish removing the BigInts from * for FD{Int128}!
NHDaly 4e53f3d
Support older versions of julia
NHDaly dfd41b1
Comments
NHDaly efee91b
Disable fldmod-by-const tests on older julia
NHDaly a03d754
Fix one other case of iseven allocating a BigInt
NHDaly 4ed8ebf
Apply this optimization to FD{Int64} as well.
NHDaly 3f39b8a
Adjust to run for all integer types!
NHDaly 20c66f2
Clarify the `_unsigned(x)` methods with comments
NHDaly f2958ba
Apply suggestions from code review
NHDaly 0019bb0
Update src/FixedPointDecimals.jl
NHDaly 4a53703
Add extensive tests for multiplication correctness, to cover the new …
NHDaly 27a3e0f
Named testsets to make failures easier to identify
NHDaly e4cb73b
Fix off-by-one error in rounding truncation in calculate_inverse_coeff()
NHDaly 73f6547
Add some comments and requires
NHDaly 4e9fdd6
Copy/pasted definition for unsigned numbers straight from the book
NHDaly 188933d
Have `magicgu` support arbitrary integer sizes
NHDaly f6d375c
Use the formulas from Hacker's delight for both signed and unsigned Ints
Drvi b79c873
.
Drvi 4f4d17a
.
Drvi eaeaddf
Restrict back to just Int128 & Int256 for custom div_by_const
NHDaly a2dcf56
It turns out that in newer versions of julia, you should call fldmod,…
NHDaly ef578e9
Reorganize the functions to be top-down
NHDaly 8df68b5
More thorough tests for flmdod_by_const
NHDaly 66e1ecb
Merge branch 'master' into nhd-Int128-fastmul-noallocs
NHDaly fd096ac
Bump patch version number
NHDaly e147f5c
Add _widemul unit test
NHDaly f830c3c
Change "unreachable" comment to an `@assert false`
NHDaly 71ba82e
add test comment
NHDaly File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,15 @@ | ||
name = "FixedPointDecimals" | ||
uuid = "fb4d412d-6eee-574d-9565-ede6634db7b0" | ||
authors = ["Fengyang Wang <[email protected]>", "Curtis Vogt <[email protected]>"] | ||
version = "0.5.2" | ||
version = "0.5.3" | ||
|
||
[deps] | ||
BitIntegers = "c3b6d118-76ef-56ca-8cc7-ebb389d030a1" | ||
Parsers = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0" | ||
|
||
[compat] | ||
Parsers = "2.7" | ||
BitIntegers = "0.3.1" | ||
julia = "1.6" | ||
|
||
[extras] | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,141 @@ | ||
# NOTE: Surprisingly, even though LLVM implements a version of this optimization on its own | ||
# for smaller integer sizes (<=64-bits), using the code in this file produces faster | ||
# multiplications for *all* types of integers. So we use our custom fldmod_by_const for all | ||
# bit integer types. | ||
# Before: | ||
# julia> @btime for _ in 1:10000 fd = fd * fd end setup = (fd = FixedDecimal{Int32,3}(1.234)) | ||
# 84.959 μs (0 allocations: 0 bytes) | ||
# FixedDecimal{Int32,3}(1700943.280) | ||
# | ||
# julia> @btime for _ in 1:10000 fd = fd * fd end setup = (fd = FixedDecimal{Int64,3}(1.234)) | ||
# 247.709 μs (0 allocations: 0 bytes) | ||
# FixedDecimal{Int64,3}(4230510070790917.029) | ||
# | ||
#julia> @btime for _ in 1:10000 fd = fd * fd end setup = (fd = FixedDecimal{Int128,3}(1.234)) | ||
# 4.077 ms (160798 allocations: 3.22 MiB) | ||
# FixedDecimal{Int128,3}(-66726338547984585007169386718143307.324) | ||
# | ||
# After: | ||
# julia> @btime for _ in 1:10000 fd = fd * fd end setup = (fd = FixedDecimal{Int32,3}(1.234)) | ||
# 68.416 μs (0 allocations: 0 bytes) | ||
# FixedDecimal{Int32,3}(1700943.280) | ||
# | ||
# julia> @btime for _ in 1:10000 fd = fd * fd end setup = (fd = FixedDecimal{Int64,3}(1.234)) | ||
# 106.125 μs (0 allocations: 0 bytes) | ||
# FixedDecimal{Int64,3}(4230510070790917.029) | ||
# | ||
# julia> @btime for _ in 1:10000 fd = fd * fd end setup = (fd = FixedDecimal{Int128,3}(1.234)) | ||
# 204.125 μs (0 allocations: 0 bytes) | ||
# FixedDecimal{Int128,3}(-66726338547984585007169386718143307.324) | ||
|
||
""" | ||
ShouldUseCustomFldmodByConst(::Type{<:MyCustomIntType})) = true | ||
A trait to control opt-in for the custom `fldmod_by_const` implementation. To use this for a | ||
given integer type, you can define this overload for your integer type. | ||
You will also need to implement some parts of the interface below, including _widen(). | ||
""" | ||
ShouldUseCustomFldmodByConst(::Type{<:Base.BitInteger}) = true | ||
ShouldUseCustomFldmodByConst(::Type{<:Union{Int256,UInt256}}) = true | ||
ShouldUseCustomFldmodByConst(::Type) = false | ||
|
||
@inline function fldmod_by_const(x, y) | ||
if ShouldUseCustomFldmodByConst(typeof(x)) | ||
# For large Int types, LLVM doesn't optimize well, so we use a custom implementation | ||
# of fldmod, which extends that optimization to those larger integer types. | ||
d = fld_by_const(x, Val(y)) | ||
return d, manual_mod(promote(x, y, d)...) | ||
else | ||
# For other integers, LLVM might be able to correctly optimize away the division, if | ||
# it knows it's dividing by a const. We cannot call `Base.fldmod` since it's not | ||
# inlined, so here we have explictly inlined it instead. | ||
return (fld(x,y), mod(x,y)) | ||
end | ||
end | ||
|
||
# Calculate fld(x,y) when y is a Val constant. | ||
# The implementation for fld_by_const was lifted directly from Base.fld(x,y), except that | ||
# it uses `div_by_const` instead of `div`. | ||
fld_by_const(x::T, y::Val{C}) where {T<:Unsigned, C} = div_by_const(x, y) | ||
function fld_by_const(x::T, y::Val{C}) where {T<:Signed, C} | ||
d = div_by_const(x, y) | ||
return d - (signbit(x ⊻ C) & (d * C != x)) | ||
end | ||
|
||
# Calculate `mod(x,y)` after you've already acquired quotient, the result of `fld(x,y)`. | ||
# REQUIRES: | ||
# - `y != -1` | ||
@inline function manual_mod(x::T, y::T, quotient::T) where T<:Integer | ||
return x - quotient * y | ||
end | ||
|
||
# This function is based on the native code produced by the following: | ||
# @code_native ((x)->div(x, 100))(Int64(2)) | ||
function div_by_const(x::T, ::Val{C}) where {T, C} | ||
# These checks will be compiled away during specialization. | ||
# While for `*(FixedDecimal, FixedDecimal)`, C will always be a power of 10, these | ||
# checks allow this function to work for any `C > 0`, in case that's useful in the | ||
# future. | ||
if C == 1 | ||
return x | ||
elseif ispow2(C) | ||
return div(x, C) # Will already do the right thing | ||
elseif C <= 0 | ||
throw(DomainError("C must be > 0")) | ||
end | ||
# Calculate the magic number 2^N/C. Note that this is computed statically, not at | ||
# runtime. | ||
inverse_coeff, toshift = calculate_inverse_coeff(T, C) | ||
# Compute the upper-half of widemul(x, 2^nbits(T)/C). | ||
# By keeping only the upper half, we're essentially dividing by 2^nbits(T), undoing the | ||
# numerator of the multiplication, so that the result is equal to x/C. | ||
out = mul_hi(x, inverse_coeff) | ||
# This condition will be compiled away during specialization. | ||
if T <: Signed | ||
# Because our magic number has a leading one (since we shift all-the-way left), the | ||
# result is negative if it's Signed. We add x to give us the positive equivalent. | ||
out += x | ||
signshift = (nbits(x) - 1) | ||
isnegative = T(out >>> signshift) # 1 if < 0 else 0 (Unsigned bitshift to read top bit) | ||
end | ||
# Undo the bitshifts used to calculate the invoeff magic number with maximum precision. | ||
out = out >> toshift | ||
if T <: Signed | ||
out = out + isnegative | ||
end | ||
return T(out) | ||
end | ||
|
||
Base.@assume_effects :foldable function calculate_inverse_coeff(::Type{T}, C) where {T} | ||
# First, calculate 2^nbits(T)/C | ||
# We shift away leading zeros to preserve the most precision when we use it to multiply | ||
# in the next step. At the end, we will shift the final answer back to undo this | ||
# operation (which is why we need to return `toshift`). | ||
# Note, also, that we calculate invcoeff at double-precision so that the left-shift | ||
# doesn't leave trailing zeros. We truncate to only the upper-half before returning. | ||
UT = _unsigned(T) | ||
invcoeff = typemax(_widen(UT)) ÷ C | ||
toshift = leading_zeros(invcoeff) | ||
invcoeff = invcoeff << toshift | ||
# Now, truncate to only the upper half of invcoeff, after we've shifted. Instead of | ||
# bitshifting, we round to maintain precision. (This is needed to prevent off-by-ones.) | ||
# -- This is equivalent to `invcoeff = T(invcoeff >> sizeof(T))`, except rounded. -- | ||
invcoeff = _round_to_nearest(fldmod(invcoeff, typemax(UT))..., typemax(UT)) % T | ||
return invcoeff, toshift | ||
end | ||
|
||
function mul_hi(x::T, y::T) where T | ||
xy = _widemul(x, y) # support Int256 -> Int512 (!!) | ||
(xy >> nbits(T)) % T | ||
end | ||
|
||
# Annoyingly, Unsigned(T) isn't defined for BitIntegers types: | ||
# https://github.com/rfourquet/BitIntegers.jl/pull/2 | ||
# Note: We do not need this for Int512, since we only widen to 512 _after_ calling | ||
# _unsigned, above. This code is only for supporting the built-in integer types, which only | ||
# go up to 128-bits (widened twice to 512). If a user wants to extend FixedDecimals for | ||
# other integer types, they will need to add methods to either _unsigned or unsigned. | ||
_unsigned(x) = unsigned(x) | ||
_unsigned(::Type{Int256}) = UInt256 | ||
omus marked this conversation as resolved.
Show resolved
Hide resolved
|
||
_unsigned(::Type{UInt256}) = UInt256 | ||
|
||
nbits(x) = sizeof(x) * 8 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
using Test | ||
using FixedPointDecimals | ||
|
||
@testset "calculate_inverse_coeff signed" begin | ||
using FixedPointDecimals: calculate_inverse_coeff | ||
|
||
# The correct magic number here comes from investigating the following native code | ||
# produced on an m2 aarch64 macbook pro: | ||
# @code_native ((x)->fld(x,100))(2) | ||
# ... | ||
# mov x8, #55051 | ||
# movk x8, #28835, lsl #16 | ||
# movk x8, #2621, lsl #32 | ||
# movk x8, #41943, lsl #48 | ||
# Where: | ||
# julia> 55051 | 28835 << 16 | 2621 << 32 | 41943 << 48 | ||
# -6640827866535438581 | ||
@test calculate_inverse_coeff(Int64, 100) == (-6640827866535438581, 6) | ||
|
||
# Same for the tests below: | ||
|
||
# (LLVM's magic number is shifted one bit less, then they shift by 2, instead of 3, | ||
# but the result is the same.) | ||
@test calculate_inverse_coeff(Int64, 10) == (7378697629483820647 << 1, 3) | ||
|
||
@test calculate_inverse_coeff(Int64, 1) == (1, 0) | ||
end | ||
|
||
@testset "calculate_inverse_coeff signed 4" begin | ||
using FixedPointDecimals: calculate_inverse_coeff | ||
|
||
# Same here, our magic number is shifted 2 bits more than LLVM's | ||
@test calculate_inverse_coeff(UInt64, 100) == (0xa3d70a3d70a3d70b, 6) | ||
|
||
@test calculate_inverse_coeff(UInt64, 1) == (UInt64(0x1), 0) | ||
end | ||
|
||
@testset "div_by_const" begin | ||
vals = [2432, 100, 0x1, Int32(10000), typemax(Int64), typemax(Int16), 8, Int64(2)^32] | ||
for a_base in vals | ||
# Only test negative numbers on `a`, since div_by_const requires b > 0. | ||
@testset for (a, b, f) in Iterators.product((a_base, -a_base), vals, (unsigned, signed)) | ||
a, b = promote(f(a), f(b)) | ||
@test FixedPointDecimals.div_by_const(a, Val(b)) == a ÷ b | ||
end | ||
end | ||
end | ||
|
||
@testset "fldmod_by_const" begin | ||
vals = [2432, 100, 0x1, Int32(10000), typemax(Int64), typemax(Int16), 8, Int64(2)^32] | ||
for a_base in vals | ||
# Only test negative numbers on `a`, since fldmod_by_const requires b > 0. | ||
@testset for (a, b, f) in Iterators.product((a_base, -a_base), vals, (unsigned, signed)) | ||
a, b = promote(f(a), f(b)) | ||
@test FixedPointDecimals.fldmod_by_const(a, b) == fldmod(a, b) | ||
end | ||
end | ||
end |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.