-
Notifications
You must be signed in to change notification settings - Fork 42
New legendre primecount #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cabal generally installs packages with default -O1 optmization unless forced by a manual install - the only way to force it always to be installed with optimization is to add this to the cabal file. Performance testing shows that the extra level of optimization gives a boost in execution speed of about two times, and for the prime counting function about two and a half times.
This version is more concise, simplified, and thereby faster at about twice the speed of the old version for the same counting range and optimization level; it would be about 100 lines of code without the comments, but it is much better documented than the old version, also including a "HowPrimeCountingWorks.md" file to give even more details. This version does not call the Sieve of Eratosthenes (SoE) module, which was part of the cruft in using that module when there wasn't much point; the algorithm doesn't much depend for execution speed on SoE sieving and this new version just implements a simple odds-only SoE within itself. As specified, the module has been tested with GHC compiler versions 9.0.2, 9.2.8, 9.4.8, 9.6.7, 9.8.4, 9.10.2, and 9.12.2; version 14.1.0 was not tested because it is in the Release Candidate stage amd thus not able to be installed through "ghcup". I could not get it through the tests without hanging at toPrimeIntegral Integer -> Int and don't know why it would change as it woudn't seem counting primes would have anything to do with that, but the new version has been tested extensively as to counting accuracy outside the testing framework.
|
I've been investigating the Tasty test running problem and think I've found the problem - the Tasty Bench problem went away by just limiting the maximum time allowed per test (just as you suggested) and that package is solid and did just as commanded, but then it is yours. I might suggest that the benchmarks in "arithmoi" are adjusted further so as to fit within a reasonable time, but then my further findings here may reduce those times anyway, as follows:
|
|
Thinking more on the above problem number 1, I guess one could continue to use the I guess I'm old school and didn't want to make this assumption as one couldn't used to be so sure. So the specialization does an integer square root other than by using Heron, but that is implemented as just getting a Double float approximation and then tuning that by couning up or down as necessary. For Word(64) that works alright as it should never have to count many more than 32 times and sometimes much less, where as the "group by two digits" classical method always takes the same time for a given input size, so an input of 2^54 will take 27 loops where the "fix the approximation will likely only have to count once. So I guess another option would be to just be sure the |
|
Sorry, got a busy weekend. I'll try to review next week, your work here is much appreciated! |
|
I just pushed https://hackage.haskell.org/package/integer-roots-1.0.4.0 with added specialization for I'll look at the issue with |
@Bodigrim, while we think about what to do with the testing, I've been moving ahead to Phase to with a full Meissel prime counting solution; it's looking good, with the main advantage, of course, the greatly reduced memory requirements, although it will also be a little faster, especially for larger counting ranges.
No problem, I've been thinking about this ever since I had some online correspondence with Daniel Fischer on a StackOverflow question so many years ago, which is what prompted me to learn about prime counting functions and their implementations. So, although I am only a mid-level mathematician (if that), I may be the ideal person for improving this part of the "arithmoi" package as I am pretty well up to speed on all matters concerning Legendre/Meissel prime counting and also am a reasonably competent programmer in quite a few languages including Haskell; I also greatly prefer the functional programming paradigm... |
Thanks for pushing the change to "integer-roots", but I haven't decided for sure what to do about the integer square root function as just "tweaking" the answer given by the floating point square root when it lacks sufficient precision seems to inelegant to me when it is so easy to implement the version using binary binomial expansion as per this Wikipedia article; even implemented for I wrote the prime counting function outside of the "arithmoi" package and tested it there as to being "monotonic" across its inflection points and accurate for various counting ranges from tiny to pretty big, and after including the function in my local version of "arithmoi" can use that same testing to get the same answers and performance, so it's hard to believe that the function has a problem. But maybe "tasty" knows better...
Anyway, you are more knowledgeable about the testing framework and can maybe better spot the problem and meantime I am working ahead on the next phase... |
I have looked into implementations of integer square root and cube root and done some research on the Internet, and don't know why mathematicians and programmers make this to be so hard, pulling up all sorts of converging approximations and series such as the Newton method, Heron's method, and probably no end of others, each with their limitations as well as limited execution speed as most of them require division operations as well as a method of providing a good initial estimate. Especially for the integer cube root, I can't find anyone who has done this by the exact method of binomial cubing (the square root exact method is by binomial squaring); I had implemented the square root version over 45 years ago because I needed it for a micro controller type of application (the CPU didn't even have built-in integer multiply and division operations, nor floating point, of course); I don't know if my work then was a first or not but the technique has definitely been published many times since. However, I have never seen an implementation of the exact integer cube root, as my implementation in Haskell below (this algorithm works really well in binary as noted in the program code comments), as follows: -- This file is integer square root and cube root functions in Haskell language
-- by W. Gordon Goodsman (GordonBGood), 2025
{-
This is based on binomial square explansion where
(old + inc)^2 -> old^2 + 2*inc*old + inc^2 = new^2, thus
new^2 - old^2 or deltasqr = 2*old*inc + inc^2.
Now this is really easy in binary where multiplying by two is just a left shift by one and
when inc is an even power of two as in 2^j, where the above expression becomes
deltasqr = old<<(j + 1) + 1<<(2j). Also, this works relative to inc or 2^j or 1<<j in that
if the whole thing is offset by 1<<j, the above expression becomes
deltasqr = (old<<1 + 1<<j) << j. That allows us to write the following routine to handle the offsetting
-}
{-# LANGUAGE BangPatterns, MagicHash #-}
import Data.Bits (shiftL, shiftR)
import GHC.Exts ( Word(..), uncheckedShiftL#, uncheckedShiftRL#
, plusWord#, minusWord#, gtWord#, leWord#, ltWord#)
import Data.Time.Clock.POSIX (getPOSIXTime) -- for timeing
isqrt :: Integer -> Integer
isqrt n =
if n < 2 then max 0 n else
if n < 2^(64 :: Int) then pisqrt n else
let shft o = if o > n then o `shiftR` 2 else shft (o `shiftL` 2)
go x one rslt =
if one <= 0 then rslt else
let nrslt = rslt `shiftR` 1 -- `rslt` is couble the current `nrslt`!
dltasqr = rslt + one -- is the same as 2*`rslt` + `one` where `one` = `inc` above
in
if x < dltasqr then go x (one `shiftR` 2) nrslt
else go (x - dltasqr) (one `shiftR` 2) (nrslt + one)
in go n (shft 1) 0
pisqrt :: Integer -> Integer
pisqrt n =
let !(W# n##) = fromIntegral n
shft# o## = case o## `leWord#` n## of
0# -> o## `uncheckedShiftRL#` 2#
_ -> case o## `ltWord#` 0x4000000000000000## of
0# -> o##
_ -> shft# (o## `uncheckedShiftL#` 2#)
go# x## one## rslt## =
case one## `gtWord#` 0## of
0# -> rslt##
_ ->
let none## = one## `uncheckedShiftRL#` 2#
nrslt## = rslt## `uncheckedShiftRL#` 1# -- `rslt` is couble the current `nrslt`!
dltasqr## = rslt## `plusWord#` one## -- is the same as 2*`rslt` + `one` where `one` = `inc` above
in
case x## `ltWord#` dltasqr## of
0# -> go# (x## `minusWord#` dltasqr##) none## (nrslt## `plusWord#` one##)
_ -> go# x## none## nrslt##
in fromIntegral $ W# $ go# n## (shft# 1##) 0##
{-
The cube root works the same as the square root except that the expansion is
(old + inc)^3 = old^3 + 3*old^2*inc + 3*old*inc^2 + inc^3 meaning that
deltacube = 3*old^2*inc + 3*old*inc^2 + inc^3
that can be simplified by two powers in the same way, with the only complication
that the current square of the current `rslt` value must be kept and updated,
which updating is done progressively as per the relationship in the sqrt algorithm.
-}
icbrt :: Integer -> Integer
icbrt n =
let neg = if n < 0 then -1 else 1
absn = abs n in
if absn < 2 then n else
if absn < 2^(64 :: Int) then neg * picbrt absn else
let shft o = if o > absn then o `shiftR` 3 else shft (o `shiftL` 3)
go x one sqr rslt =
if one <= 0 then rslt else
let none = one `shiftR` 3
nsqr = sqr `shiftR` 1
twicerslt = rslt `shiftR` 1 -- `twicerslt` is double the current `nrslt`!
nrslt = rslt `shiftR` 2
-- is the same as 3*`nsqr` + 3*`nrslt``+``one`` where`one` =``inc` above
dltacube = sqr + nsqr + twicerslt + nrslt + one
in
if x < dltacube then go x none nsqr nrslt
else go (x - dltacube) none (nsqr + twicerslt + one) (nrslt + one)
in (go absn (shft 1) 0 0) * neg
picbrt :: Integer -> Integer
picbrt n =
let !(W# n##) = fromIntegral n
shft# o## = case o## `leWord#` n## of
0# -> o## `uncheckedShiftRL#` 3#
_ -> case o## `ltWord#` 0x8000000000000000## of
0# -> o##
_ -> shft# (o## `uncheckedShiftL#` 3#)
go# x## one## sqr## rslt## =
case one## `gtWord#` 0## of
0# -> rslt##
_ ->
let none## = one## `uncheckedShiftRL#` 3#
nsqr## = sqr## `uncheckedShiftRL#` 1#
twicerslt## = rslt## `uncheckedShiftRL#` 1#
nrslt## = rslt## `uncheckedShiftRL#` 2#
-- is the same as 3*`rslt##^2 + 3*`rst##` + `one##` where `one##` is ``inc`` above
dltacube## = sqr## `plusWord#` nsqr## `plusWord#` twicerslt## `plusWord#` nrslt## `plusWord#` one##
in
case sqr## `leWord#` dltacube## of
0# -> go# x## none## nsqr## nrslt## -- overflow!
_ ->
case x## `ltWord#` dltacube## of
0# -> go# (x## `minusWord#` dltacube##) none##
(nsqr## `plusWord#` twicerslt## `plusWord#` one##) (nrslt## `plusWord#` one##)
_ -> go# x## none## nsqr## nrslt##
in fromIntegral $ W# $ go# n## (shft# 1##) 0## 0##
xisqrt :: Integer -> Integer
xisqrt n =
if n < 2 then max 0 n else
let absn = fromIntegral n :: Word
est = floor $ sqrt (fromIntegral n :: Double) :: Word
chktoohi e = let chk = e * e in
if chk < e || chk > absn then chktoohi (e - 1) else e + 1
chktoolo ep1 = let chk = ep1 * ep1 in
if chk < ep1 || chk > absn then ep1 - 1 else chktoolo (ep1 + 1)
in fromIntegral (chktoolo $ chktoohi est)
xicbrt :: Integer -> Integer
xicbrt n =
let neg = if n < 0 then -1 else 1
absn = fromIntegral $ abs n :: Word in
if absn < 2 then n else
let est = floor $ (fromIntegral absn :: Double)**(1.0/3.0 :: Double) :: Word
chktoohi e = let chk0 = e * e in let chk = chk0 * e in
if chk < chk0 || chk > absn then chktoohi (e - 1) else e + 1
chktoolo ep1 = let chk0 = ep1 * ep1 in let chk = chk0 * ep1 in
if chk < chk0 || chk > absn then ep1 - 1 else chktoolo (ep1 + 1)
in fromIntegral (chktoolo $ chktoohi est) * neg
main :: IO ()
main = do
let arg = 2^(64 :: Int) - 1
putStrLn $ "The square root of " ++ show arg ++ " is " ++ show (xisqrt arg) ++ "."
putStrLn $ "The cube root of " ++ show arg ++ " is " ++ show (xicbrt arg) ++ "."
-- let rng = [ 2^(64 :: Int) - 1 - 10^(8 :: Int) .. 2^(64 :: Int) - 1 ] -- [-1000 .. 100000] ++ [10^(20 :: Int) .. 10^(20 :: Int) + 10^(4 :: Int)]
let rng = [ -1000 .. 10000000] -- ++ [10^(20 :: Int) .. 10^(20 :: Int) + 10^(4 :: Int)]
strt <- getPOSIXTime
if all id [ isqrt v == xisqrt v | v <- rng ] then putStrLn "SQRT CORRECT" else putStrLn "SQRT FAILURE!!!!!"
if all id [ icbrt v == xicbrt v | v <- rng ] then putStrLn "CBRT CORRECT" else putStrLn "CBRT FAILURE!!!!!"
stop <- getPOSIXTime
let elpsd = round (1000.0 * (realToFrac $ stop - strt) :: Double) :: Int
putStrLn $ "This took " ++ show elpsd ++ " milliseconds."If you want to play with it, I have also run it on Wandbox. It seems that these might be better candidates than the current versions of integer square root and cube root on "integer-roots". These have no divisions or even multiplications and only use additions, subtractions, and left and right bit shifts, as well as a few comparisons with the majority of them taking almost no time because of being part of recurring loops that are predicable because of almost always following the same path; only one comparison is less predicable as it might go either way. As you see, I have tested these by comparing their results to versions using the Double floating point estimates and tuning as does "integer-roots" over a considerable range. they are quite fast at millions of roots per second, with the cube root even faster because it is O(n^(1/3)) instead of O(n^(1/2)). Please tell me what you think... EDIT_ADD: If one were to include the above algorithm code in the "integer-roots" package, one would likely specialize on all the major Int, Word, Int32, Int64, Word32, and Word64, to interface to non-Integer version(s), maybe even using Wrd64 unboxed primitives, for probably a few time constant factor in improved execution times, although because the algorithm doesn't use multiplication or division, the gains wouldn't be that large... EDIT_ADD_ADD: So I translated the algorithms to use unboxed I don't see anything performance-wise that would prevent "integer-roots" from using the exact algorithm... FINAL_EDIT_ADD: I finished tuning the unboxed primitive code (had to check for overflow, especially for the cbrt one) and now the primitive versions are always at least a little faster than using the For arguments above this bound, I didn't test and compare with the Heron method and the Heron method may well be better as it converges faster... |
|
How embarrassing: while I checked that input arguments of zero and one worked, I neglected to check for input arguments of two to eight, and they cause an infinite loop. Accordingly, I am closing this PR and issuing a new one with the one-line change as well as the requirement for the version of "integer-roots" that specializes on However, the points made here are still valid as follows:
|
This version is more concise, simplified, and thereby faster at about
twice the speed of the old version for the same counting range and
optimization level; it would be about 100 lines of code without the
comments, but it is much better documented than the old version, also
including a "HowPrimeCountingWorks.md" file to give even more details.
This version does not call the Sieve of Eratosthenes (SoE) module, which
was part of the cruft in using that module when there wasn't much point;
the algorithm doesn't much depend for execution speed on SoE sieving and
this new version just implements a simple odds-only SoE within itself.
As specified, the module has been tested with GHC compiler versions
9.0.2, 9.2.8, 9.4.8, 9.6.7, 9.8.4, 9.10.2, and 9.12.2; version 14.1.0
was not tested because it is in the Release Candidate stage amd thus not
able to be installed through "ghcup".
I could not get it through the tests without hanging at toPrimeIntegral
Integer -> Int and don't know why it would change as it woudn't seem
counting primes would have anything to do with that, but the new version
has been tested extensively as to counting accuracy outside the testing
framework.