-
Notifications
You must be signed in to change notification settings - Fork 852
Description
Hello. I am analyzing trial results in which there are four treatments. Although the modeled results return contrasts between three of the treatments and a designated control group, I would like to generate estimates of the outcome for each treatment so that I can validate the predictions against the observed values in the holdout sample. A similar issue (#290) was raised before for generating predictions with two treatment groups, and the following code was forwarded by @Thomas9292 for generating predictions:
# Create holdout set
X_train, X_test, t_train, t_test, y_train, y_test_actual = train_test_split(df_confounder, df_treatment, target, test_size=0.2)
# Fit learner on training set
learner = XGBTRegressor()
learner.fit(X=X_train, treatment=t_train, y=y_train)
# Predict the TE for test, and request the components (predictions for t=1 and t=0)
te_test_preds, yhat_c, yhat_t = learner.predict(X_test, t_test, return_components=True)
# Mask the yhats to correspond with the observed treatment (we can only test accuracy for those)
yhat_c = yhat_c[1] * (1 - t_test)
yhat_t = yhat_t[1] * t_test
yhat_test = yhat_t + yhat_c
# Model prediction error
MSE = mean_squared_error(y_test_actual, yhat_test)
print(f"{'Model MSE:':25}{MSE}")
# Also plotted actuals vs. predictions in here, will spare you the code
Apparently the above code generates predicted outcomes for the treatment group (yhat_t) and control group (yhat_c). When I apply this code to my data, yhat_t returns three vectors, which I assume correspond to predictions for each of the three non-control treatments. However, yhat_c also returns three vectors. Do the three yhat_c vectors represent three different predictions for the control groups? Whatever the case, how might I generate predictions for each treatment, including the control group?