Skip to content

Conversation

@GordonBGood
Copy link
Contributor

This version is more concise, simplified, and thereby faster at about
twice the speed of the old version for the same counting range and
optimization level; it would be about 100 lines of code without the
comments, but it is much better documented than the old version, also
including a "HowPrimeCountingWorks.md" file to give even more details.

This version does not call the Sieve of Eratosthenes (SoE) module, which
was part of the cruft in using that module when there wasn't much point;
the algorithm doesn't much depend for execution speed on SoE sieving and
this new version just implements a simple odds-only SoE within itself.

As specified, the module has been tested with GHC compiler versions
9.0.2, 9.2.8, 9.4.8, 9.6.7, 9.8.4, 9.10.2, and 9.12.2; version 14.1.0
was not tested because it is in the Release Candidate stage amd thus not
able to be installed through "ghcup".

I could not get it through the tests without hanging at toPrimeIntegral
Integer -> Int and don't know why it would change as it woudn't seem
counting primes would have anything to do with that, but the new version
has been tested extensively as to counting accuracy outside the testing
framework.

GordonBGood and others added 4 commits November 27, 2025 13:55
cabal generally installs packages with default -O1 optmization unless forced
by a manual install - the only way to force it always to be installed with
optimization is to add this to the cabal file.

Performance testing shows that the extra level of optimization gives a boost
in execution speed of about two times, and for the prime counting function
about two and a half times.
This version is more concise, simplified, and thereby faster at about
twice the speed of the old version for the same counting range and
optimization level; it would be about 100 lines of code without the
comments, but it is much better documented than the old version, also
including a "HowPrimeCountingWorks.md" file to give even more details.

This version does not call the Sieve of Eratosthenes (SoE) module, which
was part of the cruft in using that module when there wasn't much point;
the algorithm doesn't much depend for execution speed on SoE sieving and
this new version just implements a simple odds-only SoE within itself.

As specified, the module has been tested with GHC compiler versions
9.0.2, 9.2.8, 9.4.8, 9.6.7, 9.8.4, 9.10.2, and 9.12.2; version 14.1.0
was not tested because it is in the Release Candidate stage amd thus not
able to be installed through "ghcup".

I could not get it through the tests without hanging at toPrimeIntegral
Integer -> Int and don't know why it would change as it woudn't seem
counting primes would have anything to do with that, but the new version
has been tested extensively as to counting accuracy outside the testing
framework.
@GordonBGood
Copy link
Contributor Author

@Bodigrim,

I've been investigating the Tasty test running problem and think I've found the problem - the Tasty Bench problem went away by just limiting the maximum time allowed per test (just as you suggested) and that package is solid and did just as commanded, but then it is yours. I might suggest that the benchmarks in "arithmoi" are adjusted further so as to fit within a reasonable time, but then my further findings here may reduce those times anyway, as follows:

  1. The integerSquareRoot function sucks in its general use. I had written my own isqrt speciallized for Word64 as required by my version of the countPrime function as per this PR, but instead switched to this function "because it was already there" and I assumed that it was good and FAST. BAD ASSUMPTION. In the original version it worked out because there are specializations and rules that cover the cases of, for instance, Int and Word and the old version fell within those rules; however, Word64 did not specialize and the default is to use "Heron's method" which involves a whole bunch of nested divisions. THIS IS TAKING ABOUT 0.15 SECONDS TWICE SO 0.3 SECONDS on my quite fast computer where it only takes about 1.7 seconds to count the primes to 1e14 otherwise. So the answer is to stop using it for this use and go back to my version, which just uses the binomial guess and keep or retract that a human uses to calculate the square root the old school way, but in binary. Out of interest, I generalized my function so it wasn't just restricted to native integers and words (so (Bits a, Ord a, Num a) => a -> a) and its still fast, although probably not as fast as just restricting it to Word64 which is all this primeCount function needs. However, it seems my general implementation might be a better function for this than the current one for that repository, which has been split out from "arithmoi"? Definitely, my implementation would be a better for use within "arithmoi".

  2. Given we now understand why the testing got so exceedingly long for my new version of the primeCount function and know how to get the testing time back into line, the second question is why the f... the Tasty test suite just hung when faced with this long amount of testing time and doesn't seem to respond to a time limit as does your Tasty Bench framework? I see you are a contributor to that repository and perhaps you could pursue this? I didn't set out to get so involved with other packages than "arithmoi" and then limited only to prime counting and maybe the Sieve of Eratosthenes, so am reluctant to get involved myself.

  3. So we need to make a decision about this PR: should I just close it and open a new one with the integer square root fixed, and should I fix it for this specialized use or for more general use as an example, which isn't enough slower to make a difference to its use here - it seems to me that other places where this function is used in "arithmoi" could also benefit from a better implementation. Also, how about version number for the "arithmoi" package; should this version be bumped a little given that it is compatible with current 0.13.2.0 but has a significant performance gain, especially including the use of "-O2" in the PR just previous to this one. Your wish is my command.

@GordonBGood
Copy link
Contributor Author

@Bodigrim,

Thinking more on the above problem number 1, I guess one could continue to use the integerSquareRoot function with the Word specialization as that will now pretty much always 64 bits as there aren't any 32-bit platforms or versions supported anymore (GHCup doesn't support 32-bit and there haven't been binary distributions for i386 other than for Debian Linus and therefore Word is equivalent to Word64 ALMOST always.

I guess I'm old school and didn't want to make this assumption as one couldn't used to be so sure.

So the specialization does an integer square root other than by using Heron, but that is implemented as just getting a Double float approximation and then tuning that by couning up or down as necessary. For Word(64) that works alright as it should never have to count many more than 32 times and sometimes much less, where as the "group by two digits" classical method always takes the same time for a given input size, so an input of 2^54 will take 27 loops where the "fix the approximation will likely only have to count once.

So I guess another option would be to just be sure the integerSquareRoot function gets a Word input and the Word that is returned can be converted to whatever is requried.

@Bodigrim
Copy link
Owner

Bodigrim commented Dec 1, 2025

Sorry, got a busy weekend. I'll try to review next week, your work here is much appreciated!

@Bodigrim
Copy link
Owner

Bodigrim commented Dec 2, 2025

I just pushed https://hackage.haskell.org/package/integer-roots-1.0.4.0 with added specialization for Word64. So if you do cabal update and restrict integer-roots >= 1.0.4.0 in arithmoi.cabal, you should not have to compromise on using Word instead of Word64.

I'll look at the issue with tasty next once I get time.

@GordonBGood
Copy link
Contributor Author

Sorry, got a busy weekend. I'll try to review next week...

@Bodigrim, while we think about what to do with the testing, I've been moving ahead to Phase to with a full Meissel prime counting solution; it's looking good, with the main advantage, of course, the greatly reduced memory requirements, although it will also be a little faster, especially for larger counting ranges.

your work here is much appreciated!

No problem, I've been thinking about this ever since I had some online correspondence with Daniel Fischer on a StackOverflow question so many years ago, which is what prompted me to learn about prime counting functions and their implementations. So, although I am only a mid-level mathematician (if that), I may be the ideal person for improving this part of the "arithmoi" package as I am pretty well up to speed on all matters concerning Legendre/Meissel prime counting and also am a reasonably competent programmer in quite a few languages including Haskell; I also greatly prefer the functional programming paradigm...

@GordonBGood
Copy link
Contributor Author

added specialization for Word64

@Bodigrim,

Thanks for pushing the change to "integer-roots", but I haven't decided for sure what to do about the integer square root function as just "tweaking" the answer given by the floating point square root when it lacks sufficient precision seems to inelegant to me when it is so easy to implement the version using binary binomial expansion as per this Wikipedia article; even implemented for Integer, this is pretty fast as it uses only half the number of bits of the argument in number of loops, so the square root of a 2000 bit number only requires 1000 loops and I had already written it this way before I decided to try the "integer-roots" function without looking into how it was implemented. Anyway, I'm not sure if this is what is causing the problem with testing anyway - I just know that testing doesn't crash or report a problem but just goes into some sort of very long time calculation that doesn't seem to terminate, even when I tried specifying a timeout...

I wrote the prime counting function outside of the "arithmoi" package and tested it there as to being "monotonic" across its inflection points and accurate for various counting ranges from tiny to pretty big, and after including the function in my local version of "arithmoi" can use that same testing to get the same answers and performance, so it's hard to believe that the function has a problem. But maybe "tasty" knows better...

I'll look at the issue with tasty next once I get time.

Anyway, you are more knowledgeable about the testing framework and can maybe better spot the problem and meantime I am working ahead on the next phase...

@GordonBGood
Copy link
Contributor Author

GordonBGood commented Dec 6, 2025

it is so easy to implement the version using binary binomial expansion...

@Bodigrim,

I have looked into implementations of integer square root and cube root and done some research on the Internet, and don't know why mathematicians and programmers make this to be so hard, pulling up all sorts of converging approximations and series such as the Newton method, Heron's method, and probably no end of others, each with their limitations as well as limited execution speed as most of them require division operations as well as a method of providing a good initial estimate. Especially for the integer cube root, I can't find anyone who has done this by the exact method of binomial cubing (the square root exact method is by binomial squaring); I had implemented the square root version over 45 years ago because I needed it for a micro controller type of application (the CPU didn't even have built-in integer multiply and division operations, nor floating point, of course); I don't know if my work then was a first or not but the technique has definitely been published many times since. However, I have never seen an implementation of the exact integer cube root, as my implementation in Haskell below (this algorithm works really well in binary as noted in the program code comments), as follows:

-- This file is integer square root and cube root functions in Haskell language
--   by W. Gordon Goodsman (GordonBGood), 2025

{-
This is based on binomial square explansion where
(old + inc)^2 -> old^2 + 2*inc*old + inc^2 = new^2, thus
new^2 - old^2 or deltasqr = 2*old*inc + inc^2.
Now this is really easy in binary where multiplying by two is just a left shift by one and
when inc is an even power of two as in 2^j, where the above expression becomes
deltasqr = old<<(j + 1) + 1<<(2j).  Also, this works relative to inc or 2^j or 1<<j in that
if the whole thing is offset by 1<<j, the above expression becomes
deltasqr = (old<<1 + 1<<j) << j.  That allows us to write the following routine to handle the offsetting
-}

{-# LANGUAGE BangPatterns, MagicHash #-}

import Data.Bits (shiftL, shiftR)
import GHC.Exts ( Word(..), uncheckedShiftL#, uncheckedShiftRL#
                , plusWord#, minusWord#, gtWord#, leWord#, ltWord#)

import Data.Time.Clock.POSIX (getPOSIXTime) -- for timeing

isqrt :: Integer -> Integer
isqrt n =
  if n < 2 then max 0 n else
  if n < 2^(64 :: Int) then pisqrt n else
  let shft o = if o > n then o `shiftR` 2 else shft (o `shiftL` 2)
      go x one rslt =
        if one <= 0 then rslt else
        let nrslt = rslt `shiftR` 1 -- `rslt` is couble the current `nrslt`!
            dltasqr = rslt + one -- is the same as 2*`rslt` + `one` where `one` = `inc` above
        in
        if x < dltasqr then go x (one `shiftR` 2) nrslt
        else go (x - dltasqr) (one `shiftR` 2) (nrslt + one)
  in go n (shft 1) 0

pisqrt :: Integer -> Integer
pisqrt n =
  let !(W# n##) = fromIntegral n
      shft# o## = case o## `leWord#` n## of
                    0# -> o## `uncheckedShiftRL#` 2#
                    _ -> case o## `ltWord#` 0x4000000000000000## of
                           0# -> o##
                           _ -> shft# (o## `uncheckedShiftL#` 2#)
      go# x## one## rslt## =
        case one## `gtWord#` 0## of
          0# -> rslt##
          _ ->
            let none## = one## `uncheckedShiftRL#` 2#
                nrslt## = rslt## `uncheckedShiftRL#` 1# -- `rslt` is couble the current `nrslt`!
                dltasqr## = rslt## `plusWord#` one## -- is the same as 2*`rslt` + `one` where `one` = `inc` above
            in
            case x## `ltWord#` dltasqr## of
              0# -> go# (x## `minusWord#` dltasqr##) none## (nrslt## `plusWord#` one##)
              _ -> go# x## none## nrslt##
  in fromIntegral $ W# $ go# n## (shft# 1##) 0##

{-
The cube root works the same as the square root except that the expansion is
(old + inc)^3 = old^3 + 3*old^2*inc + 3*old*inc^2 + inc^3 meaning that
deltacube = 3*old^2*inc + 3*old*inc^2 + inc^3
that can be simplified by two powers in the same way, with the only complication
that the current square of the current `rslt` value must be kept and updated,
which updating is done progressively as per the relationship in the sqrt algorithm.
-}

icbrt :: Integer -> Integer
icbrt n =
  let neg = if n < 0 then -1 else 1
      absn = abs n in
  if absn < 2 then n else
  if absn < 2^(64 :: Int) then neg * picbrt absn else
  let shft o = if o > absn then o `shiftR` 3 else shft (o `shiftL` 3)
      go x one sqr rslt =
        if one <= 0 then rslt else
        let none = one `shiftR` 3
            nsqr = sqr `shiftR` 1
            twicerslt = rslt `shiftR` 1 -- `twicerslt` is double the current `nrslt`!
            nrslt = rslt `shiftR` 2
            -- is the same as 3*`nsqr` + 3*`nrslt``+``one`` where`one` =``inc` above
            dltacube = sqr + nsqr + twicerslt + nrslt + one
        in
        if x < dltacube then go x none nsqr nrslt
        else go (x - dltacube) none (nsqr + twicerslt + one) (nrslt + one) 
  in (go absn (shft 1) 0 0) * neg

picbrt :: Integer -> Integer
picbrt n =
  let !(W# n##) = fromIntegral n
      shft# o## = case o## `leWord#` n## of
                    0# -> o## `uncheckedShiftRL#` 3#
                    _ -> case o## `ltWord#` 0x8000000000000000## of
                           0# -> o##
                           _ -> shft# (o## `uncheckedShiftL#` 3#)
      go# x## one## sqr## rslt## =
        case one## `gtWord#` 0## of
          0# -> rslt##
          _ ->
            let none## = one## `uncheckedShiftRL#` 3#
                nsqr## = sqr## `uncheckedShiftRL#` 1#
                twicerslt## = rslt## `uncheckedShiftRL#` 1#
                nrslt## = rslt## `uncheckedShiftRL#` 2#
                -- is the same as 3*`rslt##^2 + 3*`rst##` + `one##` where `one##` is ``inc`` above
                dltacube## = sqr## `plusWord#` nsqr## `plusWord#` twicerslt## `plusWord#` nrslt## `plusWord#` one##
            in
            case sqr## `leWord#` dltacube## of
              0# -> go# x## none## nsqr## nrslt## -- overflow!
              _ ->
                case x## `ltWord#` dltacube## of
                  0# -> go# (x## `minusWord#` dltacube##) none##
                            (nsqr## `plusWord#` twicerslt## `plusWord#` one##) (nrslt## `plusWord#` one##)
                  _ -> go# x## none## nsqr## nrslt##
  in fromIntegral $ W# $ go# n## (shft# 1##) 0## 0##

xisqrt :: Integer -> Integer
xisqrt n =
  if n < 2 then max 0 n else
  let absn = fromIntegral n :: Word
      est = floor $ sqrt (fromIntegral n :: Double) :: Word
      chktoohi e = let chk = e * e in
                   if chk < e || chk > absn then chktoohi (e - 1) else e + 1
      chktoolo ep1 = let chk = ep1 * ep1 in
                     if chk < ep1 || chk > absn then ep1 - 1 else chktoolo (ep1 + 1)
  in fromIntegral (chktoolo $ chktoohi est)

xicbrt :: Integer -> Integer
xicbrt n =
  let neg = if n < 0 then -1 else 1
      absn = fromIntegral $ abs n :: Word in
  if absn < 2 then n else
  let est = floor $ (fromIntegral absn :: Double)**(1.0/3.0 :: Double) :: Word
      chktoohi e = let chk0 = e * e in let chk = chk0 * e in
                   if chk < chk0 || chk > absn then chktoohi (e - 1) else e + 1
      chktoolo ep1 = let chk0 = ep1 * ep1 in let chk = chk0 * ep1 in
                     if chk < chk0 || chk > absn then ep1 - 1 else chktoolo (ep1 + 1)
  in fromIntegral (chktoolo $ chktoohi est) * neg

main :: IO ()
main = do
  let arg = 2^(64 :: Int) - 1
  putStrLn $ "The square root of " ++ show arg ++ " is " ++ show (xisqrt arg) ++ "."
  putStrLn $ "The cube root of " ++ show arg ++ " is " ++ show (xicbrt arg) ++ "."
--  let rng = [ 2^(64 :: Int) - 1 - 10^(8 :: Int) .. 2^(64 :: Int) - 1 ] -- [-1000 .. 100000] ++ [10^(20 :: Int) .. 10^(20 :: Int) + 10^(4 :: Int)]
  let rng = [ -1000 .. 10000000] -- ++ [10^(20 :: Int) .. 10^(20 :: Int) + 10^(4 :: Int)]
  strt <- getPOSIXTime
  if all id [ isqrt v == xisqrt v | v <- rng ] then putStrLn "SQRT CORRECT" else putStrLn "SQRT FAILURE!!!!!"
  if all id [ icbrt v == xicbrt v | v <- rng ] then putStrLn "CBRT CORRECT" else putStrLn "CBRT FAILURE!!!!!"
  stop <- getPOSIXTime
  let elpsd = round (1000.0 * (realToFrac $ stop - strt) :: Double) :: Int
  putStrLn $ "This took " ++ show elpsd ++ " milliseconds."

If you want to play with it, I have also run it on Wandbox. It seems that these might be better candidates than the current versions of integer square root and cube root on "integer-roots". These have no divisions or even multiplications and only use additions, subtractions, and left and right bit shifts, as well as a few comparisons with the majority of them taking almost no time because of being part of recurring loops that are predicable because of almost always following the same path; only one comparison is less predicable as it might go either way.

As you see, I have tested these by comparing their results to versions using the Double floating point estimates and tuning as does "integer-roots" over a considerable range. they are quite fast at millions of roots per second, with the cube root even faster because it is O(n^(1/3)) instead of O(n^(1/2)).

Please tell me what you think...

EDIT_ADD:

If one were to include the above algorithm code in the "integer-roots" package, one would likely specialize on all the major Int, Word, Int32, Int64, Word32, and Word64, to interface to non-Integer version(s), maybe even using Wrd64 unboxed primitives, for probably a few time constant factor in improved execution times, although because the algorithm doesn't use multiplication or division, the gains wouldn't be that large...

EDIT_ADD_ADD:

So I translated the algorithms to use unboxed Word# primitives and it made a bigger difference than I expected: about 27.5 times faster than using Integer's; with this, the algorithm is now about 25% faster than using the double operations, even for the largest of arguments where the algorithm may be slowest. However, the algorithm is about ten times slower than using the double operations for small ranges where the precision of the double square root is enough, which I guess makes sense in that there aren't any tuning adjustments to make so the time for the double operations is just the raw speed of the floating point processor whereas the algorithm has to still "run the loops". However, realistically, I don't think speed is really an issue with any of the implementations, as what algorithm need faster than a worst case of over five million square roots a second?

I don't see anything performance-wise that would prevent "integer-roots" from using the exact algorithm...

FINAL_EDIT_ADD:

I finished tuning the unboxed primitive code (had to check for overflow, especially for the cbrt one) and now the primitive versions are always at least a little faster than using the Double floating point approximations, especially at the to of the bound for Word/Word64 where my version of the integer square root is more than twice as fast as using the Double floating point approximation. I feel that these may be a better choice for integer square root and integer cube root for up to this bound...

For arguments above this bound, I didn't test and compare with the Heron method and the Heron method may well be better as it converges faster...

@GordonBGood
Copy link
Contributor Author

@Bodigrim:

How embarrassing: while I checked that input arguments of zero and one worked, I neglected to check for input arguments of two to eight, and they cause an infinite loop. Accordingly, I am closing this PR and issuing a new one with the one-line change as well as the requirement for the version of "integer-roots" that specializes on Word64.

However, the points made here are still valid as follows:

  1. The exact algorithms shown here that use binomial expansions for square root and cube root really are better than the approximations currently used by "integer-roots" for fixed size arguments (up to Word64); we should look into doing a PR on that repository to improve the current ones...

  2. Your "tasty" test framework really does hang if a test program goes into an infinite loop, unlike with some frameworks and should really be fixed: I suppose that might require that tests be run on separate threads so that when a time-out is reached on the monitoring thread, the problem test threads can be killed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants