Skip to content

Commit 574e1c4

Browse files
committed
Change namespace form Data.DataFrame to DataFrame
1 parent 80ec483 commit 574e1c4

35 files changed

+261
-174
lines changed

app/Main.hs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66

77
module Main where
88

9-
import qualified Data.DataFrame as D
10-
import Data.DataFrame (dimensions, (|>))
9+
import qualified DataFrame as D
10+
import DataFrame (dimensions, (|>))
1111
import Data.List (delete)
1212
import Data.Maybe (fromMaybe, isJust, isNothing)
1313
import qualified Data.Text as T

benchmark/Main.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{-# LANGUAGE NumericUnderscores #-}
22
{-# LANGUAGE OverloadedStrings #-}
33

4-
import qualified Data.DataFrame as D
4+
import qualified DataFrame as D
55
import qualified Data.Vector.Unboxed as VU
66

77
import Control.Monad (replicateM)

dataframe.cabal

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -22,25 +22,25 @@ source-repository head
2222
location: https://github.com/mchav/dataframe
2323

2424
library
25-
exposed-modules: Data.DataFrame
26-
other-modules: Data.DataFrame.Internal.Types,
27-
Data.DataFrame.Internal.Function,
28-
Data.DataFrame.Internal.Parsing,
29-
Data.DataFrame.Internal.Column,
30-
Data.DataFrame.Display.Terminal.PrettyPrint,
31-
Data.DataFrame.Display.Terminal.Colours,
32-
Data.DataFrame.Internal.DataFrame,
33-
Data.DataFrame.Internal.Row,
34-
Data.DataFrame.Errors,
35-
Data.DataFrame.Operations.Core,
36-
Data.DataFrame.Operations.Subset,
37-
Data.DataFrame.Operations.Sorting,
38-
Data.DataFrame.Operations.Statistics,
39-
Data.DataFrame.Operations.Transformations,
40-
Data.DataFrame.Operations.Typing,
41-
Data.DataFrame.Operations.Aggregation,
42-
Data.DataFrame.Display.Terminal.Plot,
43-
Data.DataFrame.IO.CSV
25+
exposed-modules: DataFrame
26+
other-modules: DataFrame.Internal.Types,
27+
DataFrame.Internal.Function,
28+
DataFrame.Internal.Parsing,
29+
DataFrame.Internal.Column,
30+
DataFrame.Display.Terminal.PrettyPrint,
31+
DataFrame.Display.Terminal.Colours,
32+
DataFrame.Internal.DataFrame,
33+
DataFrame.Internal.Row,
34+
DataFrame.Errors,
35+
DataFrame.Operations.Core,
36+
DataFrame.Operations.Subset,
37+
DataFrame.Operations.Sorting,
38+
DataFrame.Operations.Statistics,
39+
DataFrame.Operations.Transformations,
40+
DataFrame.Operations.Typing,
41+
DataFrame.Operations.Aggregation,
42+
DataFrame.Display.Terminal.Plot,
43+
DataFrame.IO.CSV
4444
build-depends: base >= 4.17.2.0 && < 4.21,
4545
array ^>= 0.5,
4646
attoparsec >= 0.12 && <= 0.14.4,
@@ -58,25 +58,25 @@ library
5858

5959
executable dataframe
6060
main-is: Main.hs
61-
other-modules: Data.DataFrame,
62-
Data.DataFrame.Internal.Types,
63-
Data.DataFrame.Internal.Function,
64-
Data.DataFrame.Internal.Parsing,
65-
Data.DataFrame.Internal.Column,
66-
Data.DataFrame.Display.Terminal.PrettyPrint,
67-
Data.DataFrame.Display.Terminal.Colours,
68-
Data.DataFrame.Internal.DataFrame,
69-
Data.DataFrame.Internal.Row,
70-
Data.DataFrame.Errors,
71-
Data.DataFrame.Operations.Core,
72-
Data.DataFrame.Operations.Subset,
73-
Data.DataFrame.Operations.Sorting,
74-
Data.DataFrame.Operations.Statistics,
75-
Data.DataFrame.Operations.Transformations,
76-
Data.DataFrame.Operations.Typing,
77-
Data.DataFrame.Operations.Aggregation,
78-
Data.DataFrame.Display.Terminal.Plot,
79-
Data.DataFrame.IO.CSV
61+
other-modules: DataFrame,
62+
DataFrame.Internal.Types,
63+
DataFrame.Internal.Function,
64+
DataFrame.Internal.Parsing,
65+
DataFrame.Internal.Column,
66+
DataFrame.Display.Terminal.PrettyPrint,
67+
DataFrame.Display.Terminal.Colours,
68+
DataFrame.Internal.DataFrame,
69+
DataFrame.Internal.Row,
70+
DataFrame.Errors,
71+
DataFrame.Operations.Core,
72+
DataFrame.Operations.Subset,
73+
DataFrame.Operations.Sorting,
74+
DataFrame.Operations.Statistics,
75+
DataFrame.Operations.Transformations,
76+
DataFrame.Operations.Typing,
77+
DataFrame.Operations.Aggregation,
78+
DataFrame.Display.Terminal.Plot,
79+
DataFrame.IO.CSV
8080
build-depends: base >= 4.17.2.0 && < 4.21,
8181
array ^>= 0.5,
8282
attoparsec >= 0.12 && <= 0.14.4,

docs/coming_from_pandas.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ python> df
5858
```
5959

6060
```haskell
61-
ghci> import qualified Data.DataFrame as D
61+
ghci> import qualified DataFrame as D
6262
ghci> import qualified Data.Vector as V
6363
ghci> import System.Random (randomRIO)
6464
ghci> import Control.Monad (replicateM)

docs/coming_from_polars.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ As a standalone dataframe script this would look like.
3636

3737

3838
```haskell
39-
import qualified Data.DataFrame as D
39+
import qualified DataFrame as D
4040
import Data.Time.Calendar
4141

4242
main :: IO
@@ -111,10 +111,10 @@ Would be written as:
111111
```haskell
112112
{-# LANGUAGE ScopedTypeVariables #-}
113113
{-# LANGUAGE TypeApplications #-}
114-
import qualified Data.DataFrame as D
114+
import qualified DataFrame as D
115115
import qualified Data.Text as T
116116

117-
import Data.DataFrame.Operations ( (|>) )
117+
import DataFrame.Operations ( (|>) )
118118
import Data.Time.Calendar
119119

120120
main :: IO ()
@@ -133,10 +133,10 @@ Or, more clearly:
133133
```haskell
134134
{-# LANGUAGE ScopedTypeVariables #-}
135135
{-# LANGUAGE TypeApplications #-}
136-
import qualified Data.DataFrame as D
136+
import qualified DataFrame as D
137137
import qualified Data.Text as T
138138

139-
import Data.DataFrame.Operations ( (|>) )
139+
import DataFrame ( (|>) )
140140
import Data.Time.Calendar
141141

142142
main :: IO ()

docs/exploratory_data_analysis_primer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Univariate non-graphical analysis should give us a sense of the distribution of
2626
For categorical data the best univariate non-graphical analysis is a tabulation of the frequency of each category.
2727

2828
```haskell
29-
ghci> import qualified Data.DataFrame as D
29+
ghci> import qualified DataFrame as D
3030
ghci> D.frequencies "ocean_proximity" df
3131

3232
------------------------------------------------------------------------------

docs/haskell_for_data_analysis.md

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Haskell for Data Analysis
2+
3+
This section ports/mirrors Wes McKinney's book [Python for Data Analysis](https://wesmckinney.com/book/). Examples and organizations are drawn from there. This tutorial assumes an understanding of Haskell.
4+
5+
## Data preparation
6+
Data in the wild doesn't always come in a form that's easy to work with. A data analysis tool should make preparing and cleaning data easy. There are a number of common issues that data analysis too must handle. We'll go through a few common ones and show how to deal with them in Haskell.
7+
8+
### Handling missing data
9+
In Haskell, potentially missing values are represented by a "wrapper" type called [`Maybe`](https://en.wikibooks.org/wiki/Haskell/Understanding_monads/Maybe).
10+
11+
```
12+
ghci> import qualified DataFrame as D
13+
ghci> let df = D.fromColumnList [D.toColumn [Just 1, Just 1, Nothing, Nothing], D.toColumn [Just 6.5, Nothing, Nothing, Just 6.5], D.toColumn [Just 3.0, Nothing, Nothing, Just 3.0]]
14+
ghci> df
15+
---------------------------------------------------
16+
index | 0 | 1 | 2
17+
------|---------------|--------------|-------------
18+
Int | Maybe Integer | Maybe Double | Maybe Double
19+
------|---------------|--------------|-------------
20+
0 | Just 1 | Just 6.5 | Just 3.0
21+
1 | Just 1 | Nothing | Nothing
22+
2 | Nothing | Nothing | Nothing
23+
3 | Nothing | Just 6.5 | Just 3.0
24+
25+
```
26+
27+
If we'd like to drop all rows with missing values we can use the `filterJust` function.
28+
29+
```haskell
30+
ghci> D.filterJust "0" df
31+
---------------------------------------------
32+
index | 0 | 1 | 2
33+
------|---------|--------------|-------------
34+
Int | Integer | Maybe Double | Maybe Double
35+
------|---------|--------------|-------------
36+
0 | 1 | Just 6.5 | Just 3.0
37+
1 | 1 | Nothing | Nothing
38+
```
39+
40+
The function filters out the non-`Nothing` values and "unwrap" the `Maybe` type. To filter all `Nothing` values we use the `filterAllJust` function.
41+
42+
```haskell
43+
ghci> D.filterAllJust df
44+
---------------------------------
45+
index | 0 | 1 | 2
46+
------|---------|--------|-------
47+
Int | Integer | Double | Double
48+
------|---------|--------|-------
49+
0 | 1 | 6.5 | 3.0
50+
```
51+
52+
To fill in the missing values we the impute function which replaces all instances of `Nothing` with a given value.
53+
54+
```haskell
55+
ghci> D.impute "0" (0 :: Integer) df
56+
---------------------------------------------
57+
index | 0 | 1 | 2
58+
------|---------|--------------|-------------
59+
Int | Integer | Maybe Double | Maybe Double
60+
------|---------|--------------|-------------
61+
0 | 1 | Just 6.5 | Just 3.0
62+
1 | 1 | Nothing | Nothing
63+
2 | 0 | Nothing | Nothing
64+
3 | 0 | Just 6.5 | Just 3.0
65+
```
66+
67+
There is no general way to replace ALL nothing values with a default since the default depends on the type. In fact, trying to apply the wrong type to a function throws an error:
68+
69+
```haskell
70+
ghci> D.impute @Double "0" 0 df
71+
*** Exception:
72+
73+
[Error]: Type Mismatch
74+
While running your code I tried to get a column of type: "Maybe Double" but column was of type: "Maybe Integer"
75+
This happened when calling function apply on the column 0
76+
77+
78+
79+
Try adding a type at the end of the function e.g change
80+
apply arg1 arg2 to
81+
(apply arg1 arg2 :: <Type>)
82+
or add {-# LANGUAGE TypeApplications #-} to the top of your file then change the call to
83+
apply @<Type> arg1 arg2
84+
```
85+
86+
In general, Haskell would usually have a compile-time. But because dataframes are usually run in REPL-like environments which offer immediate feedback to users, `dataframe` is fine turning these into compile-time exceptions.
87+

src/Data/DataFrame.hs

Lines changed: 0 additions & 26 deletions
This file was deleted.

src/DataFrame.hs

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
module DataFrame
2+
( module D,
3+
(|>)
4+
)
5+
where
6+
7+
import DataFrame.Internal.Types as D
8+
import DataFrame.Internal.Function as D
9+
import DataFrame.Internal.Parsing as D
10+
import DataFrame.Internal.Column as D
11+
import DataFrame.Internal.DataFrame as D hiding (columnIndices, columns)
12+
import DataFrame.Internal.Row as D hiding (mkRowRep)
13+
import DataFrame.Errors as D
14+
import DataFrame.Operations.Core as D
15+
import DataFrame.Operations.Subset as D
16+
import DataFrame.Operations.Sorting as D
17+
import DataFrame.Operations.Statistics as D
18+
import DataFrame.Operations.Transformations as D
19+
import DataFrame.Operations.Typing as D
20+
import DataFrame.Operations.Aggregation as D
21+
import DataFrame.Display.Terminal.Plot as D
22+
import DataFrame.IO.CSV as D
23+
24+
import Data.Function
25+
26+
(|>) = (&)

src/Data/DataFrame/Display/Terminal/Colours.hs renamed to src/DataFrame/Display/Terminal/Colours.hs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
module Data.DataFrame.Display.Terminal.Colours where
1+
module DataFrame.Display.Terminal.Colours where
22

33
-- terminal color functions
44
red :: String -> String

0 commit comments

Comments
 (0)