Skip to content

Information theory metrics calculation issue #16

@surbhir08

Description

@surbhir08

Hi Team,
Thanks for addressing the issue of density estimation for multidimensional data.
I have a few questions as I am trying to implement information theory metrics:

  • Q1.Is this method apt for high dimensional tabular data?
  • Q2.I have been trying to run RBIG mutual info() over a tabular data and the results are exact same for all of them, I did check the results using SK learn MI score and got variables results (results not normalized in both cases- SK learn and RBIG). I don't understand the error, can you in anyway help me with this?

below is the piece of code I used:

X: features (attributes not in Y)
Y: set of y attributes (attributes not in X) (let's say y1,y2,y3,y4)
def calculate_miscore_xa(data,X,Y):
mis_xy = []
y_attributes = []
for y in Y:
rbig_model = MutualInfoRBIG(max_layers = 10000)
rbig_model.fit(data[X], data[[y]]);
mi_rbig = rbig_model.mutual_info() * np.log(2)
mis_xy.append(mi_rbig)
y_attributes.append(a)
mis_xy = pd.DataFrame({'Y':y_attributes, 'I(Xi,Y)': mis_xy})
return mis_xy

basically the results I am getting is
I(X,y1) = I(X,y2) = I(X,y3) = I(X,y4) = exact same
It's unusual hence I checked the results using https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html and the results for I(X,y1), I(X,y2), I(X,y3),I(X,y4) differ.
Can you help me understand if there is anythings I am doing wrong ?

Also the original calculation using entropy implemented in information theory notebook can be used used as base for tabular data by substituting respective X and Y in 2d format?

Thanks and Regards
Surbhi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions