Skip to content

Latest commit

 

History

History
293 lines (229 loc) · 10.9 KB

HOWTO.md

File metadata and controls

293 lines (229 loc) · 10.9 KB

Using this library

This is a short guide to demonstrate how this library may be used. Familiarity with some form of arrays and mutable arrays in Haskell is expected. If your array type is not present here, it should still be possible to adapt some of the code below to your use case.

Sorting boxed elements

This library offers the function

sortArrayBy
  :: (a -> a -> Ordering)  -- ^ comparison
  -> MutableArray# s a
  -> Int                   -- ^ offset
  -> Int                   -- ^ length
  -> ST s ()

So how does one use this?

  • The first parameter is a comparison function that will be used to order the elements.
  • The second parameter is a MutableArray# with elements of type a. This is a primitive array type provided by GHC. This array will be sorted in place.
  • The third and fourth parameters are Ints which demarcate a slice of the array. Elements in this slice will be sorted, and other elements will not be touched.
  • The return type is an ST action. If you are not familiar with ST, please see the documentation for Control.Monad.ST.

To use sortArrayBy, an important step is to put the elements to be sorted into a MutableArray#. The best way to do this depends on how the elements are stored prior to sorting, as we will see in the examples below.

Example 1: MVector

Consider that we need to sort a mutable vector MVector from the vector library. This is quite easy, and in fact we do not need to put elements anywhere because the underlying representation of an MVector is a MutableArray#! We only need to access it.

import Control.Monad.Primitive (PrimMonad(..), stToPrim)  -- from the package "primitive"
import Data.Primitive.Array (MutableArray(..))  -- also from "primitive"
import Data.Vector.Mutable (MVector(..))

import qualified Data.SamSort as Sam

-- | Sort a mutable vector in place.
sortMV :: (PrimMonad m, Ord a) => MVector (PrimState m) a -> m ()
sortMV = sortMVBy compare

-- | Sort a mutable vector in place using a comparison function.
sortMVBy :: PrimMonad m => (a -> a -> Ordering) -> MVector (PrimState m) a -> m ()
sortMVBy cmp (MVector off len (MutableArray ma)) =
  stToPrim $ Sam.sortArrayBy cmp ma off len

Example 2: Vector

Now consider sorting an (immutable) Vector, again from the vector library. Since we cannot mutate it, we will return a sorted copy. The most convenient way here is to thaw the Vector to get an MVector, then sort it as we did above.

import Data.Vector (Vector)
import qualified Data.Vector as V

-- | Sort a vector.
sortV :: Ord a => Vector a -> Vector a
sortV = sortVBy compare

-- | Sort a vector using a comparison function.
sortVBy :: (a -> a -> Ordering) -> Vector a -> Vector a
sortVBy cmp v = V.create $ do
  mv <- V.thaw v
  sortMVBy cmp mv -- from Example 1 above
  pure mv

We can test it out in GHCI.

>>> sortV (V.fromList [5,2,6,3,4,1])
[1,2,3,4,5,6]
>>> import Data.Ord (comparing)
>>> sortVBy (comparing length) (V.fromList ["Lunar","11.3","Candle","Magic"])
["11.3","Lunar","Magic","Candle"]

Example 3: List

Let us now try to sort a list, like Data.List.sort does. Here we will need to move the elements from the list into a MutableArray#.

I recommend using the primitive library for this task. primitive provides boxed wrappers over GHC primitive types, and functions to work with them. While it is possible to do this without any library, it is easiest to use what is already available. If you are unable to use primitive, you can take a peek at the relevant definitions there and use them directly.

import qualified Data.Foldable as F
import Data.Primitive.Array (MutableArray(..))
import qualified Data.Primitive.Array as A

import qualified Data.SamSort as Sam

-- | Sort a list.
sortL :: Ord a => [a] -> [a]
sortL = sortLBy compare

-- | Sort a list using a comparison function.
sortLBy :: (a -> a -> Ordering) -> [a] -> [a]
sortLBy cmp xs = F.toList $ A.runArray $ do
  let a = A.arrayFromList xs
      n = A.sizeofArray a
  ma@(MutableArray ma') <- A.thawArray a 0 n
  Sam.sortArrayBy cmp ma' 0 n
  pure ma

In GHCI,

>>> sortL ["Fall","In","The","Dark"]
["Dark","Fall","In","The"]
>>> import Data.Ord (Down(..), comparing)
>>> sortLBy (comparing Down) [3.4,8.5,9.1,7.9,3.1,6.2]
[9.1,8.5,7.9,6.2,3.4,3.1]

Tip

Avoid Data.List's sort and sortBy when a large number of elements need to be fully sorted and performance is a concern. Sorting lists is quite inefficient compared to sorting a mutable array in place.

Sorting Ints

Converting to a MutableArray# and sorting, as shown in the above section, should cover the majority of use cases. If we are dealing with unboxed data however, we can do better. We may be storing Ints in an unboxed array for efficiency, but pulling them out and boxing them for sorting would ruin that efficiency.

This is where we can use the second function from this library.

sortIntArrayBy
  :: (Int -> Int -> Ordering)  -- ^ comparison
  -> MutableByteArray# s
  -> Int                       -- ^ offset in Int#s
  -> Int                       -- ^ length in Int#s
  -> ST s ()

As you might have guessed, this sorts an unboxed array of Ints. We can use this whenever we need to sort Ints, or even other types that may be cheaply converted to and from Ints (like Word).

Example 1: Unboxed MVector

Let us sort a mutable unboxed MVector Int. Like with the boxed MVector, we do not need to move the elements because the underlying representation is a MutableByteArray#.

import Control.Monad.Primitive (PrimMonad(..), stToPrim)
import Data.Primitive.ByteArray (MutableByteArray(..))
import qualified Data.Vector.Primitive as VP
import qualified Data.Vector.Unboxed.Mutable as VUM
import qualified Data.Vector.Unboxed.Base as VUB

import qualified Data.SamSort as Sam

sortVUMInt :: PrimMonad m => VUM.MVector (PrimState m) Int -> m ()
sortVUMInt = sortVUMIntBy compare

sortVUMIntBy
  :: PrimMonad m
  => (Int -> Int -> Ordering) -> VUM.MVector (PrimState m) Int -> m ()
sortVUMIntBy cmp mv = case mv of
  VUB.MV_Int (VP.MVector off len (MutableByteArray ma')) ->
    stToPrim $ Sam.sortIntArrayBy cmp ma' off len

Warning

Do not try changing the element type above to sort any other unboxed vector. MutableByteArray# is the underlying representation for many unboxed vectors, but it would be incorrect to use the above code if the element type is not Int.

Sorting by index

We have now covered sorting boxed values, and sorting Ints. What about other types in unboxed arrays?

Example 1: Unboxed Vector

Consider that we need to sort an unboxed vector of some unknown type a. We cannot assume anything about the representation of the unboxed Vector a, because it can be anything at all depending on the type a. Can we sort such a vector efficiently?

We know that we can index any vector. We also know that we can construct vectors, using the generate function for instance. That is all we will need.

First we will create an Int vector, the elements of which are indices into the a vector. Then we will sort this Int vector using a comparison function that indexes the a vector and compares as. As the final result, we will construct a vector with as in the order of the sorted indices.

This technique is general enough that we can sort any flavor of Vector (boxed, Unboxed, Prim, Storable), so we can use Vector.Generic to define the functions.

import Data.Primitive.ByteArray (MutableByteArray(..))
import qualified Data.Vector.Generic as VG
import qualified Data.Vector.Primitive as VP
import qualified Data.Vector.Primitive.Mutable as VPM

import qualified Data.SamSort as Sam

-- | Sort a vector.
sortByIdxVG :: (Ord a, VG.Vector v a) => v a -> v a
sortByIdxVG = sortByIdxVGBy compare

-- | Sort a vector using a comparison function.
sortByIdxVGBy :: VG.Vector v a => (a -> a -> Ordering) -> v a -> v a
sortByIdxVGBy cmp v = VG.generate n (VG.unsafeIndex v . VP.unsafeIndex ixa)
  where
    n = VG.length v
    cmp' i j = cmp (VG.unsafeIndex v i) (VG.unsafeIndex v j)
    ixa = VP.create $ do
      ixma <- VPM.generate n id
      case ixma of
        VPM.MVector off len (MutableByteArray ma') ->
          Sam.sortIntArrayBy cmp' ma' off len
      pure ixma

In fact, this technique is general enough to be used whenever indices can be sorted, using any method, and not just with this library!

Sorting by index is more beneficial the larger the elements are in memory, since moving around index Ints is cheaper than moving around the elements themselves. This is demonstrated in these benchmarks on elements of type (Int, Int, Int), for sort functions which support both direct sorting and sorting by index.

We can confirm that our sort works as expected in GHCI.

>>> import Data.Ord (comparing)
>>> import qualified Data.Vector.Unboxed as VU
>>> let v = VU.fromList [(6,4),(5,4),(1,2)] :: VU.Vector (Int,Int)
>>> sortByIdxVGBy (comparing snd) v
[(1,2),(6,4),(5,4)]

Sorting unboxed arrays of small elements

So sorting by index is more beneficial the larger the element is, but what about small elements? Perhaps we need to sort an unboxed array of Word8s, or Floats?

Our options as seen above are:

  • Convert to a boxed array and sort
  • Sort by index

While neither option is ideal, the second option is better. The most efficient way to sort small elements is to sort the array of such elements directly. Unfortunately, this library cannot be used to do this because there are only two functions, one to sort boxed values, and one to sort Ints. However, sorting by index will be close to as fast as sorting directly.

The primitive-sort library may also be a good fit for this task. It can sort such small elements directly and efficiently, though it has some drawbacks (not adaptive, cannot sort a slice, cannot sort using a comparison function, more dependencies).

vector-algorithms is also able to sort small elements directly, however it turns out to be slower in practice.