You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This section ports/mirrors Wes McKinney's book [Python for Data Analysis](https://wesmckinney.com/book/). Examples and organizations are drawn from there. This tutorial assumes an understanding of Haskell.
4
+
5
+
## Data preparation
6
+
Data in the wild doesn't always come in a form that's easy to work with. A data analysis tool should make preparing and cleaning data easy. There are a number of common issues that data analysis too must handle. We'll go through a few common ones and show how to deal with them in Haskell.
7
+
8
+
### Handling missing data
9
+
In Haskell, potentially missing values are represented by a "wrapper" type called [`Maybe`](https://en.wikibooks.org/wiki/Haskell/Understanding_monads/Maybe).
10
+
11
+
```
12
+
ghci> import qualified DataFrame as D
13
+
ghci> let df = D.fromColumnList [D.toColumn [Just 1, Just 1, Nothing, Nothing], D.toColumn [Just 6.5, Nothing, Nothing, Just 6.5], D.toColumn [Just 3.0, Nothing, Nothing, Just 3.0]]
If we'd like to drop all rows with missing values we can use the `filterJust` function.
28
+
29
+
```haskell
30
+
ghci>D.filterJust "0" df
31
+
---------------------------------------------
32
+
index |0|1|2
33
+
------|---------|--------------|-------------
34
+
Int|Integer|MaybeDouble|MaybeDouble
35
+
------|---------|--------------|-------------
36
+
0|1|Just6.5|Just3.0
37
+
1|1|Nothing|Nothing
38
+
```
39
+
40
+
The function filters out the non-`Nothing` values and "unwrap" the `Maybe` type. To filter all `Nothing` values we use the `filterAllJust` function.
41
+
42
+
```haskell
43
+
ghci>D.filterAllJust df
44
+
---------------------------------
45
+
index |0|1|2
46
+
------|---------|--------|-------
47
+
Int|Integer|Double|Double
48
+
------|---------|--------|-------
49
+
0|1|6.5|3.0
50
+
```
51
+
52
+
To fill in the missing values we the impute function which replaces all instances of `Nothing` with a given value.
53
+
54
+
```haskell
55
+
ghci>D.impute "0" (0::Integer) df
56
+
---------------------------------------------
57
+
index |0|1|2
58
+
------|---------|--------------|-------------
59
+
Int|Integer|MaybeDouble|MaybeDouble
60
+
------|---------|--------------|-------------
61
+
0|1|Just6.5|Just3.0
62
+
1|1|Nothing|Nothing
63
+
2|0|Nothing|Nothing
64
+
3|0|Just6.5|Just3.0
65
+
```
66
+
67
+
There is no general way to replace ALL nothing values with a default since the default depends on the type. In fact, trying to apply the wrong type to a function throws an error:
68
+
69
+
```haskell
70
+
ghci>D.impute @Double"0"0 df
71
+
***Exception:
72
+
73
+
[Error]:TypeMismatch
74
+
While running your code I tried to get a column oftype:"Maybe Double" but column was oftype:"Maybe Integer"
75
+
This happened when calling function apply on the column 0
76
+
77
+
78
+
79
+
Try adding a type at the end of the function e.g change
80
+
apply arg1 arg2 to
81
+
(apply arg1 arg2 ::<Type>)
82
+
or add {-# LANGUAGE TypeApplications #-} to the top of your file then change the call to
83
+
apply @<Type> arg1 arg2
84
+
```
85
+
86
+
In general, Haskell would usually have a compile-time. But because dataframes are usually run in REPL-like environments which offer immediate feedback to users, `dataframe` is fine turning these into compile-time exceptions.
0 commit comments