training.py: performs actual training with a standard linear regression model and feature selection. data_prep.py: performs general data formatting drug_models.csv: contain the output consisiting of drug name, loss (MSE) and an array of coefficients