-
Notifications
You must be signed in to change notification settings - Fork 79
Capstone Project 1
- https://stackoverflow.com/questions/28651079/pandas-unstack-problems-valueerror-index-contains-duplicate-entries-cannot-re
- http://www.datasciencemadesimple.com/reshape-long-wide-pandas-python-pivot-function/
- https://www.datacamp.com/community/tutorials/pandas-multi-index
- https://stackoverflow.com/questions/13295735/how-can-i-replace-all-the-nan-values-with-zeros-in-a-column-of-a-pandas-datafra
- https://stackoverflow.com/questions/20110170/turn-pandas-multi-index-into-column
- https://stackoverflow.com/questions/36537945/reshape-wide-to-long-in-pandas
- https://stackoverflow.com/questions/45352909/pandas-indexingerror-unalignable-boolean-series-provided-as-indexer
- https://stackoverflow.com/questions/42477572/sort-values-method-in-pandas
- https://stackoverflow.com/questions/16958499/sort-pandas-dataframe-and-print-highest-n-values
- http://pandas.pydata.org/pandas-docs/version/0.17/generated/pandas.DataFrame.sort.html
- https://stackoverflow.com/questions/19523277/renaming-column-names-in-pandas-groupby-function
- https://stackoverflow.com/questions/47138271/how-to-create-a-stacked-bar-chart-for-my-dataframe-using-seaborn
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html
- How To Add a New Column to Using a Dictionary in Pandas Data Frame ?: Pandas Tutorial
- https://stackoverflow.com/questions/13445241/replacing-blank-values-white-space-with-nan-in-pandas
- https://stackoverflow.com/questions/37840812/pandas-subtracting-two-date-columns-and-the-result-being-an-integer
- https://www.dataquest.io/blog/regular-expressions-data-scientists/
- https://python-graph-gallery.com/all-charts/
- https://stats.stackexchange.com/questions/95797/how-to-split-the-dataset-for-cross-validation-learning-curve-and-final-evaluat
- https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas/46165056#46165056
- https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns
- https://stackoverflow.com/questions/13996302/python-rolling-functions-for-groupby-object
- https://stackoverflow.com/questions/13872533/plot-different-dataframes-in-the-same-figure
- https://stackoverflow.com/questions/51711306/filter-group-by-and-count-in-pandas
- https://pandas.pydata.org/pandas-docs/stable/reshaping.html
- https://stackoverflow.com/questions/26646191/pandas-groupby-month-and-year
- https://stackoverflow.com/questions/23891575/how-to-merge-two-dataframes-side-by-side
- https://seaborn.pydata.org/examples/wide_data_lineplot.html
- https://stackoverflow.com/questions/9452775/converting-numpy-dtypes-to-native-python-types
- https://stackoverflow.com/questions/13730468/from-nd-to-1d-arrays
- https://stackoverflow.com/questions/43664994/numpy-random-choice-vs-random-choice
- https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
- https://stackoverflow.com/questions/48786906/cannot-make-seaborn-violin-plot-horizontal-python3-x
- https://datascience.stackexchange.com/questions/22062/is-this-a-bug-in-seaborn https://stackoverflow.com/questions/6871201/plot-two-histograms-at-the-same-time-with-matplotlib
Really good tutorial on how to build a logistic regression model: https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8 https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns
- https://towardsdatascience.com/random-forest-in-python-24d0893d51c0
- https://www.datacamp.com/community/tutorials/random-forests-classifier-python
- https://stackabuse.com/random-forest-algorithm-with-python-and-scikit-learn/
- https://machinelearningmastery.com/implement-random-forest-scratch-python/
- https://en.wikipedia.org/wiki/Random_forest
-
https://nycdatascience.com/blog/meetup/featured-talk-1-kaggle-data-scientist-owen-zhang/
-
https://datascienceplus.com/extreme-gradient-boosting-with-python/
-
https://medium.com/mlreview/gradient-boosting-from-scratch-1e317ae4587d
-
http://benalexkeen.com/gradient-boosting-in-python-using-scikit-learn/
-
https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/
- https://robots.thoughtbot.com/analyzing-minards-visualization-of-napoleons-1812-march
- https://blog.ouseful.info/2017/11/28/quick-round-up-visualising-flows-using-network-and-sankey-diagrams-in-python-and-r/
- https://plotlyblog.tumblr.com/post/120532468127/how-to-analyze-data-seven-modern-remakes-of-the
-
For figuring out how to optimize logistic regression model: https://towardsdatascience.com/logistic-regression-model-tuning-with-scikit-learn-part-1-425142e01af5
-
For explaining LogisticRegression vs LogisticRegressionCV: https://stackoverflow.com/questions/46507606/what-does-the-cv-stand-for-in-sklearn-linear-model-logisticregressioncv
-
Hyperparameter tuning on random forests: https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74
-
XGBoost Hyperparameter tuning: https://www.kaggle.com/tilii7/hyperparameter-grid-search-with-xgboost
-
https://stackoverflow.com/questions/23199796/detect-and-exclude-outliers-in-pandas-data-frame
-
Population MOdel vs Randomization Approach:
-
Compare means between two different groups:
-
https://lagunita.stanford.edu/courses/Engineering/CS101/Summer2014/about
- Compare categorical distributions:
Chi Square Tests Population Model:
- https://stattrek.com/chi-square-test/independence.aspx
- https://stattrek.com/chi-square-test/homogeneity.aspx
- https://stattrek.com/chi-square-test/goodness-of-fit.aspx
Permutation Test for Heterogeneity of Categorical Variables:
- https://www.stat-d.si/mz/mz4.1/arboretti.pdf
- http://old.sis-statistica.org/files/pdf/atti/Spontanee%202006_165-168.pdf
Read later:
- https://jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/
- ECDF vs CDF: https://stats.stackexchange.com/questions/239937/empirical-cdf-vs-cdf
- Intuitive Explanation of Random Variable: https://stats.stackexchange.com/questions/95993/can-anyone-clarify-the-concept-of-a-sum-of-random-variables
- When to use a CDF vs an ECDF: https://stats.stackexchange.com/questions/4810/how-to-use-cdf-and-pdf-statistics-for-analysis *http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf
Bootstrap/Randomization Read Later:
- https://jasonkerwin.com/nonparibus/2017/09/25/randomization-inference-vs-bootstrapping-p-values/
- http://www.statisticsteacher.org/2018/03/15/model-t-or-a-newer-randomization/
- Definitely Read this one: http://evolution.gs.washington.edu/gs560/2011/lecture8.pdf
- And this one: http://www.cs.cornell.edu/courses/cs1380/2018sp/textbook/chapters/16/3/causality.html
- http://www.cs.cornell.edu/courses/cs1380/2018sp/textbook/chapters/11/2/bootstrap.html
- Love this: https://stats.stackexchange.com/questions/13607/can-non-random-samples-be-analyzed-using-standard-statistical-tests
Categorical Read Later:
- https://in.sagepub.com/sites/default/files/upm-binaries/67534_Gau_Chapter_10.pdf
- https://web.stanford.edu/class/psych10/schedule/P10_W7L1
- http://www.open.ac.uk/socialsciences/spsstutorial/files/tutorials/chi-square.pdf
- Looks interesting: https://machinelearningmastery.com/chi-squared-test-for-machine-learning/
- This too: http://www.math.montana.edu/courses/s217/documents/_book/chapter5.html
Interpretability:
- https://www.kaggle.com/learn/machine-learning-explainability
- https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.partial_dependence.plot_partial_dependence.html#sklearn.ensemble.partial_dependence.plot_partial_dependence
- https://scikit-learn.org/stable/auto_examples/ensemble/plot_partial_dependence.html
- https://www.kaggle.com/dansbecker/xgboost
- https://www.kaggle.com/dansbecker/partial-dependence-plots
- Pairgrid: http://seaborn.pydata.org/tutorial/axis_grids.html