You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Package for creating figures with NA data distribution
16
+
Do you want to visualize missing values in your data? There are plenty amazing methods (check [missingno](https://github.com/ResidentMario/missingno)for example) but they all look bulky when your data has too many columns. `nafig` will help you to build a perfect NA figure!
17
17
18
18
</div>
19
19
20
-
## Very first steps
20
+
#Installation
21
21
22
-
### Initialize your code
22
+
```bash
23
+
$ pip install -U nafig
24
+
```
23
25
24
-
1. Initialize `git` inside your repo:
26
+
or install with `Poetry`
25
27
26
28
```bash
27
-
cd nafig && git init
29
+
$ poetry add nafig
28
30
```
29
31
30
-
2. If you don't have `Poetry` installed run:
32
+
# Usage
31
33
32
-
```bash
33
-
make poetry-download
34
+
Here are some examples of the usage both for simulated and real world data. Check [this notebook](example.ipynb) to play with code yourself!
35
+
36
+
First, let's import the core function and other useful things:
37
+
38
+
```python
39
+
>>>from nafig.plots import na_text_barplot # The core function
40
+
>>>from nafig.utils import create_example_data # To simulate data
41
+
>>>import pandas as pd # To works with tables
34
42
```
35
43
36
-
3. Initialize poetry and install `pre-commit` hooks:
44
+
```python
45
+
>>> df, feature_types = create_example_data()
46
+
```
37
47
38
-
```bash
39
-
make install
40
-
make pre-commit-install
48
+
`df` is just a pandas dataframe with missing values. `feature_types` is an array, containing data type description for each column. This is just an example, so labels don't correspond to actual data types.
This toy dataframe contains 300 columns. Visualization of missing data with heatmap would unfortunately be too bulky. How to explore missing data distribution in this dataset? Try NA text barplot!
Columns of the dataset are binned by percentage of the missing data in them. Colouring by feature types helps to understand, which types of data are missing. On Y-axis you can see the number of features in each group.
66
+
67
+
You can vary the number of bins using num_bins parameter:
- Set up [Dependabot](https://docs.github.com/en/github/administering-a-repository/enabling-and-disabling-version-updates#enabling-github-dependabot-version-updates) to ensure you have the latest dependencies.
62
-
- Set up [Stale bot](https://github.com/apps/stale) for automatic issue closing.
Building a new version of the application contains steps:
100
+
Note that if you don't pass the `hue` parameter, features will be colored by the data type of the column. If you don't want to colorize features at all, set `hue` to `False`.
101
+
102
+
By setting `remove_empty_bins` to `True`, you can remove the empty bins. It will require a reader to pay more attention to the X-axis but will save you some space.
- Bump the version of your package `poetry version <version>`. You can pass the new version explicitly, or a rule such as `major`, `minor`, or `patch`. For more details, refer to the [Semantic Versions](https://semver.org/) standard.
This dataset has a bit more missing data. On the plot we can see that all integer features are almost complete, and some `object` and floating number columns contain missing values
- Maybe you would like to add [gitmoji](https://gitmoji.carloscuesta.me/) to commit names. This is really funny. 😄
129
+
# Developers section
112
130
113
131
## 🚀 Features
114
132
@@ -131,25 +149,6 @@ Articles:
131
149
- Always up-to-date dependencies with [`@dependabot`](https://dependabot.com/). You will only [enable it](https://docs.github.com/en/github/administering-a-repository/enabling-and-disabling-version-updates#enabling-github-dependabot-version-updates).
132
150
- Automatic drafts of new releases with [`Release Drafter`](https://github.com/marketplace/actions/release-drafter). You may see the list of labels in [`release-drafter.yml`](https://github.com/VladimirShitov/nafig/blob/master/.github/release-drafter.yml). Works perfectly with [Semantic Versions](https://semver.org/) specification.
133
151
134
-
### Open source community features
135
-
136
-
- Ready-to-use [Pull Requests templates](https://github.com/VladimirShitov/nafig/blob/master/.github/PULL_REQUEST_TEMPLATE.md) and several [Issue templates](https://github.com/VladimirShitov/nafig/tree/master/.github/ISSUE_TEMPLATE).
137
-
- Files such as: `LICENSE`, `CONTRIBUTING.md`, `CODE_OF_CONDUCT.md`, and `SECURITY.md` are generated automatically.
138
-
-[`Stale bot`](https://github.com/apps/stale) that closes abandoned issues after a period of inactivity. (You will only [need to setup free plan](https://github.com/marketplace/stale)). Configuration is [here](https://github.com/VladimirShitov/nafig/blob/master/.github/.stale.yml).
139
-
-[Semantic Versions](https://semver.org/) specification with [`Release Drafter`](https://github.com/marketplace/actions/release-drafter).
0 commit comments