Rex Meeting Minutes

Week 1

Decide general directions of Rex's research: predict patient's EnjoyLife Score.
Create private Github Repo.
Learn R.
Binarize EnjoyLife Score with R. Plot CDF: histogram of modfam2 (integers) and regular density graph of fam2 (floats)

Week 2

Use Linear Regression, KNN, Naive Bayes, SVM, and RandomForest classifiers to do prediction. Use 10 Fold Cross Validation; Use roc_auc as metrics.

Week 3+4

Compare classifiers with mean roc_auc, and here use shuffle so CV's result is not deterministic.
In plot_pr: average precision recall instead of appending every folds.
Use roc as optimization metrics (1. KNN, # of neighbors, e.g. parameter sweeping curve: (1,2,3,...); 2. RandomForest parameter sweeping); Plot parameter sweeping curve: x axis parameters, y axis metrics, for each classifier;Use GridSearch, RandomSearch maybe;
Take some points in PR curve and find something interesting, statistically.
Prepare to discuss Link Prediction metrics paper, connect w/ our project and possible applications.
Think about Naive_Bayes parameter optimization if have time.

Week 5

10 10 Folds.
Read about the details about KNN.Why does it behave this way.
Change the dist of Bayes.
Use Build in Param-Opt.

April 23

design doc
10 iter inside
scoring = callable
different file name for opt

April 30

fix bug
frame_work, like what I did before in a more efficient way

May 22

Plot name with opt or not especially compare clfs
Polish codes
build h5 data structure for clinical results
try to predict some stuff other than EnjoyLife

May 27

Store Grid Perf
Change name, not CDF, but conditional pdf
Plot fitted models
Merge table3Ful + fam2 + modfam2

Jun 3

- SD for Grid Perf
Visualization for Grid Perf, maybe heatmap + interpolate
Beta, Possion and NB for fitting
EDSS rate compute gradient

Jun 5

Bayes source code with possion
one more col in modified EDSSR, ignore abs dEDSSS <= 0.5; 2 class, increase or others; do analysis

Jun 11

Look at the parameter optimisation for RF + Tweak
Make MixNB, based on goodness of fit estimators for discrete distributions
histograms and conditional Density plots for merged_update

Jul 3

Use own formula of Normal Log Likelihood to write a new Gaussian and Mix NB
Mix NB with goodness Chi-Square
During fit, output graph
QOL(n) + EDSSRate(n-1) + EDSS(n-1) => ModEDSS(n)
How to deal with missing data? e.g NA, ignore for now

Jul 11

Precision Recall for new Bayes Code
Rewrite Bayes, use NA
Remove first visits, and change PreEDSSRate to 0 (imputation), then prediction.
Maybe: treatment(Y/N) from time of treatment, type of treatment.

Jul 16

Use the rate of everything to do prediction see what happens.
Understand Why is GaussanNB2 not as good, and why is Linear models good. Understand these models.
Plot Gausian’s fit on top of X.
feature_importance
probas = np.exp(self.predict_log_proba(X)); return probas #/ np.sum(probas, axis = 1)
fam2 instead of modfam2 in ModEDSS Prediction, see what happens.

Aug 5

Change the code for MixNB between Poission and Gaussian and output the fit model.
create remote to push to the private UCSF repo
Look into simple parallelization
Look at the logistic regression coefficients
Add into the model:

Patient specific
- AgeOfOnset
- Gender
- DRB1_1501
- OnsetToYr5RelapseCount
Previous year parameters:
- DiseaseDuration
- Siena_PBVC (remove the zeros) (+gradient)
- New_T2_Lesions
- meds: doesn't help

For above, (i) prepare the data from R (ii) check the CDFs

Aug 15

Shouldn't remove more than 10% of dataset. Maybe remove Sievna_PBVC, or figure out how to deal with NA.
RandomForest, LogisticRegression, LinearRegress, Gaussian2, MixNB, BayersBernoulli, How to handle NA. Output feature related stats: feature_importance for RandomForest, Coeff for 2 Regression, fit plot on X for 3 Bayers. Couldn't handle NA's.
Read about AIC, BIC.
Logistic, elastic: ridge and lasso, C. Look at x= C, y = roc, two plots (depends on penalty L1, L2).
Try different set of features: e.g. MSSS = EDSS/DD. Disease Duration(DD) = AgeAtExam(AAE) - AgeOnSet(AOS). From a core set of features, and try add the rest one by one, and generate a table with different algorithms' ROC.

Aug 22

Impute Data before plotting, 0 for all PreXXRate, KNN (or maybe RandomForestRegressor) to impute NA's.
Try different set of features. Question, although MSSS = EDSS/DD, we did some manipulation with EDSS, also we removed preEDSS of NA's.
IMPORTANT, high-level summary of what we did. DUE ON TUESDAY
From Last Time: Read about AIC, BIC, Look at x= C, y = roc, two plots (depends on penalty L1, L2).

Aug 26

Impute only X
diagnonoNA includes all features without NA not complete cases
New Bayes Model with +- sample ratio if this columns has less than 5 discrete values; if not follow MixNB
For Regressioins: plot_coeffs.

Sept 4

Change ModEDSS Remove imprecision to achieve balance
Rerun the whole thing after today's modification
Delete AgeOnSet, put AgeAtExam; replace DRB1 with DRB1 * PrevEDSS
n_iter 50
Different k Imputation with KNN maybe
Get my ID next tuesday see if works
Train your imputation formula then use it in testing.

Sept 9

Use the old EDSS
Change the dataframe name in R
Store ytrue ypred, and create plotting func(datasetname, clf)
Read paper, give a short presentation

Sept 23

NA NA Go away please come back another day
Log

datasetName <- "datasetTest" sink(file=paste0(datasetName, ".log"), append=F, split = T) cat("### PART I \n")

PART I

cat("##", "This is my log with", 1, "file to test \n")

This is my log with 1 file to test

sink()

Store ytrue ypred, and create plotting func(datasetname, clf)

Sept 30

save_ouput
plot_roc_com(datasets = [], models = [])
plot_pr_com(datasets = [], models = [])

Oct 7

SD plots use 10 times
Change the modEDSS for more relevant notification of increase. (I want to allow 0.5 differences for EDSS>4, but I have to talk to people to be sure of what I'm doing) and create new folders (probably with "new" at the front)
X X X on the heatmap for the 100 pts chosen

Oct 16

PredDate_Impr0-4, output should be able to coexist with old one. e.g. data/PredData, data/PredDate_Impr0-4
Thursday 2pm Presentation
X X X on the heatmap for the 100 pts chosen
different clf in same for sd plot
sd for pr

Oct 22

R code change to store h5 at different location of python folder
Use ./PredData/PredData.h5; ./PredData/data/; ./PredData/plots/ structure
Change code to comply with new format for gridsore

Oct 27

gridData use customized with tol
y_pred y_true use new format from Antoine's code, (e.g. in compare_obj_sd)

Nov 4

New ROC PR with D1C1, D1C2, D2C1, D2C2.
GUI!!!!!

Nov 18

Create 7 more columns in R for treatment, delete the old ones.
Class Project

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rex Meeting Minutes

Week 1

Week 2

Week 3+4

Week 5

April 23

April 30

May 22

May 27

Jun 3

Jun 5

Jun 11

Jul 3

Jul 11

Jul 16

Aug 5

Aug 15

Aug 22

Aug 26

Sept 4

Sept 9

Sept 23

PART I

This is my log with 1 file to test

Sept 30

Oct 7

Oct 16

Oct 22

Oct 27

Nov 4

Nov 18

Uh oh!

Clone this wiki locally