Ordinal Data Type by jinimukh · Pull Request #360 · lux-org/lux

jinimukh · 2021-04-15T05:00:02Z

Overview

This PR addresses #240 by adding support for the ordinal data type. Currently, the only way to set the data type to ordinal is by using df.set_data_type({"col_name": "ordinal}) functionality. Optionally, if the entries do not have a natural ordering like number or alphabetical, a custom ordering can be specified using df.set_data_type({"col_name": "ordinal}, order={"col_name": [ordered_lst]}). To visualize ordinal data types, we are using boxplots but because they are bivariate distributions, they only show up to enhance a selected visualization.

Changes

univariate.py: allow ordinal data types to be treated as nominal data types to create bar graphs in Occurrences tab
frame.py: allow the set_data_type function to take in optional order argument to specify orders on ordinal data
BoxPlot.py: currently only supports Altair BoxPlots
Compiler.py: allow the mark to be box when n_dim == 1 and n_msr == 1 and dimension_type == "ordinal"`

Example Output

codecov · 2021-04-15T08:27:32Z

Codecov Report

Merging #360 (7820f1e) into master (1dbbcb9) will decrease coverage by 0.62%.
The diff coverage is 50.00%.

❗ Current head 7820f1e differs from pull request most recent head 19a14d8. Consider uploading reports for the commit 19a14d8 to get more accurate results

@@            Coverage Diff             @@
##           master     #360      +/-   ##
==========================================
- Coverage   84.46%   83.84%   -0.63%     
==========================================
  Files          51       52       +1     
  Lines        3902     3961      +59     
==========================================
+ Hits         3296     3321      +25     
- Misses        606      640      +34

Impacted Files	Coverage Δ
lux/action/univariate.py	`90.38% <ø> (ø)`
lux/core/series.py	`53.84% <ø> (ø)`
lux/interestingness/interestingness.py	`87.95% <ø> (ø)`
lux/vislib/matplotlib/MatplotlibRenderer.py	`84.61% <0.00%> (-2.69%)`	⬇️
lux/vislib/altair/BoxPlot.py	`21.87% <21.87%> (ø)`
lux/vislib/altair/AltairRenderer.py	`94.59% <33.33%> (-2.59%)`	⬇️
lux/action/enhance.py	`96.87% <66.66%> (-3.13%)`	⬇️
lux/vislib/altair/BarChart.py	`82.66% <75.00%> (-2.19%)`	⬇️
lux/core/frame.py	`81.75% <81.81%> (+0.02%)`	⬆️
lux/executor/Executor.py	`79.48% <100.00%> (ø)`
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1dbbcb9...19a14d8. Read the comment docs.

dorisjlee · 2021-04-16T01:46:09Z

lux/vislib/altair/AltairRenderer.py

 from lux.vislib.altair.Histogram import Histogram
 from lux.vislib.altair.Heatmap import Heatmap
 from lux.vislib.altair.Choropleth import Choropleth
+from lux.vislib.altair.BoxPlot import BoxPlot


If the user uses matplotlib for boxplots, could we render the boxplot in Altair and show an info button message letting users know that the matplotlib boxplot is not currently implemented? This is similar to what we did for the geographical maps in matplotlib.

I've implemented the Altair fallback as well as the message. However, since I'm not being able to set intent on the dataframe due to the matplotlib bug, I'm not sure if the message works. Let me know if you'd like me to remove it since there is no way to verify!

dorisjlee · 2021-04-16T01:47:17Z

lux/action/univariate.py

    elif data_type_constraint == "nominal":
        possible_attributes = [
-            c for c in ldf.columns if ldf.data_type[c] == "nominal" and c != "Number of Records"
+            c


This line split is pretty weird and hard to read. Can we fix this and add a comment on what this list of possible_attributes is used for?

dorisjlee · 2021-04-26T02:33:38Z

Thanks @jinimukh!! Can we file a follow-up issue to delegate boxplot calculations to the Pandas and SQL Executor? This will help with performance by bringing down the rendering speed from the cost of a scatterplot to that of a boxplot (several summary statistics + outliers).

dorisjlee · 2021-04-26T03:01:06Z

I'm wondering if ordinal data types have to be a subset of nominal data? Apart from the documentation and within the actions logic (enhance and univariate), is there anything in the code that treats ordinal as a subset of nominal. For example, can we capture scenarios where ordinal data type could be a subset of temporal data type? Such as {Summer, Winter, Fall}, {Q1, Q2, …}. It would be helpful to add an example for this.

dorisjlee · 2021-04-26T03:32:06Z

Here's some examples that I was playing around with:

df = pd.read_csv("https://raw.githubusercontent.com/lux-org/lux-datasets/master/data/aug_test.csv")
df =df.dropna(subset=['education_level',"company_size"])
df.set_data_type({'education_level': "ordinal"}, 
                 order={'education_level': ['Primary School', 'High School', 'Masters','Graduate', 'Phd']})
df["education_level"]


df.set_data_type({'company_size': "ordinal"}, 
                 order={'company_size': [
                     '<10', '10/49', '50-99', '100-500',
                       '500-999', '1000-4999', '5000-9999','10000+'
                 ]})
df["company_size"]

I was initially a bit confused by why the boxplot was not shown for the number of records case in univariate (until we set the intent), then I realized that the boxplot didn't make sense for the ordinal data type. I wonder if it makes sense to have a bivariate ordinal data type tab, i.e., ordinal with respect to all measure values, so that the boxplot could be shown in the initial view.
Otherwise, it would appear that setting the intent doesn't change anything.

jinimukh added 22 commits December 22, 2020 20:55

coalesce data_types into data_type_lookup

2cef000

Merge branch 'master' of https://github.com/lux-org/lux

cad3e84

merge

f197884

merge fixed

1ed9655

merge conflicts

c56f79d

merged

c0388df

Merge branch 'master' of https://github.com/jinimukh/lux

cf045de

Merge branch 'master' into foo

6ae9767

merge upstream

0db6376

first commit

1e6f572

conflicts

7836abf

requirements.txt updated for pandas 1.2.2

9c17abb

Merge branch 'master' of https://github.com/jinimukh/lux

b3509f6

Merge branch 'master' of https://github.com/lux-org/lux

d428289

Merge branch 'master' of https://github.com/lux-org/lux

9716088

some progress

f7e2d6b

violin plot working; need changes in Compiler

bc689a9

Box Plot added, mapping and bivariate interface reqd if necessary

a8c25c9

need to work on mapping

2768ce9

mapping working with null entries

ecd76f5

mapping and truncating work

14e17c7

upstream merge

0a23b50

jinimukh marked this pull request as draft April 15, 2021 05:02

removed extra files, ran black

8e118d8

minor changes, ordinal vars now in occurences/univaiate

6df324d

jinimukh marked this pull request as ready for review April 15, 2021 08:41

jinimukh requested a review from dorisjlee April 15, 2021 08:42

dorisjlee requested a review from westernguy2 April 16, 2021 00:48

dorisjlee reviewed Apr 16, 2021

View reviewed changes

jinimukh added 5 commits April 22, 2021 17:22

bar chart sorted for nominal data type

5f5669e

docs done

50eea83

test added for set_data_type

d426e96

add Altair fallback for matplotlib

f010999

add note to docs

3b55df6

jinimukh requested a review from dorisjlee April 23, 2021 02:35

jinimukh added 4 commits April 22, 2021 19:37

upstream merge

7fca30f

black

077db46

fix merge conflict

9396603

changes

7820f1e

jinimukh changed the title ~~Features/ordinal 2~~ Ordinal Data Type Apr 23, 2021

jinimukh closed this Apr 26, 2021

jinimukh reopened this Apr 26, 2021

refactor and create new action

19a14d8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ordinal Data Type#360

Ordinal Data Type#360
jinimukh wants to merge 34 commits intolux-org:masterfrom
jinimukh:features/ordinal_2

jinimukh commented Apr 15, 2021 •

edited

Loading

Uh oh!

codecov bot commented Apr 15, 2021 •

edited

Loading

Uh oh!

dorisjlee Apr 16, 2021

Uh oh!

jinimukh Apr 23, 2021

Uh oh!

dorisjlee Apr 16, 2021

Uh oh!

dorisjlee commented Apr 26, 2021 •

edited

Loading

Uh oh!

dorisjlee commented Apr 26, 2021

Uh oh!

dorisjlee commented Apr 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jinimukh commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Changes

Example Output

Uh oh!

codecov bot commented Apr 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dorisjlee Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

jinimukh Apr 23, 2021

Choose a reason for hiding this comment

Uh oh!

dorisjlee Apr 16, 2021

Choose a reason for hiding this comment

Uh oh!

dorisjlee commented Apr 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dorisjlee commented Apr 26, 2021

Uh oh!

dorisjlee commented Apr 26, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jinimukh commented Apr 15, 2021 •

edited

Loading

codecov bot commented Apr 15, 2021 •

edited

Loading

dorisjlee commented Apr 26, 2021 •

edited

Loading