Currently we actually take a trained learner and feed it  a version of the task where every feature has been marginally sampled, which is, in essence, a very inefficient way to calculate mean(target) (in a regression or binary classif case).
Tried to validate one of the shapley assumptions for SAGE (sum(sage_values) ≈ baseline_loss - full_model_loss) and yeah, just using mean(target) lead to a much closer result, which isn't surprising.