Evaluation of equations to test data #642
gm89uk
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I found have found it v.time consuming to find the sweet spot of complexity that has the best fit for test data with the higher complexities on the training data overfitting and performing poorly on the test data.
In addition I also needed to apply the equations to different subgroups of the data and see how each equation applied to those.
I have put together some python code to make my life easier, and someone else might find it useful.
It assumes that in the main database (xlsx), has multiple otherwise identical worksheets, that only differ with the data. For example you can have "train", "test_group1", "test_group2" etc.
The output would look something like this:

This table is for all training and test data combined
The line graphs separate train and test, and present combined data. Separate line graphs are produced for each subcategory

Apologies if code is highly unoptimised!
Example:
Beta Was this translation helpful? Give feedback.
All reactions