-
Notifications
You must be signed in to change notification settings - Fork 1.9k
CBify Supervised to Contextual Bandit
cbify is a reduction that converts supervised learning examples (multiclass, cost-sensitive, or regression) into contextual bandit problems. This lets you apply CB exploration algorithms to existing labeled datasets, which is useful for evaluating exploration strategies, simulating online decision-making from offline data, or warm-starting a CB policy.
Given a supervised example with a known label:
- The CB exploration policy proposes a probability distribution over actions
- An action is sampled from this distribution
- A cost is computed by comparing the sampled action to the true label
- The CB learner updates using this (action, cost, probability) tuple
This simulates the partial-feedback setting of contextual bandits: the learner only observes the cost for the action it chose, not for all actions.
The basic mode converts multiclass examples into CB problems. The argument to --cbify specifies the number of actions K.
Suppose train.dat contains multiclass examples:
1 | feature_a feature_b
3 | feature_c
2 | feature_a feature_c
Then:
vw --cbify 3 --epsilon 0.05 -d train.dat
This converts each example into a K-armed bandit problem. If the sampled action matches the true label, cost is 0; otherwise cost is 1. The --loss0 and --loss1 options control these values (defaults 0.0 and 1.0).
All of the standard CB exploration strategies are available:
vw --cbify 10 --epsilon 0.1 -d train.dat # epsilon-greedy (default)
vw --cbify 10 --first 5 -d train.dat # explore-first
vw --cbify 10 --bag 7 -d train.dat # bagging
vw --cbify 10 --cover 3 -d train.dat # online cover
By adding --cb_explore_adf, cbify uses the action-dependent features framework. This is more flexible and supports additional exploration algorithms like RegCB and SquareCB:
vw --cbify 10 --cb_explore_adf --cb_type mtr --regcb --mellowness 0.01 -d train.dat
vw --cbify 10 --cb_explore_adf --cb_type mtr --squarecb --gamma_scale 500 -d train.dat
When examples have per-action costs rather than a single correct label, use --cbify_cs:
vw --cbify 3 --cbify_cs --epsilon 0.05 -d cs_data.dat
The input uses VW's cost-sensitive format:
1:0 2:1 3:1 | feature_a
1:1 2:0 3:0.5 | feature_b
Costs are interpolated between --loss0 and --loss1 based on the per-class cost values.
For cost-sensitive examples with label-dependent features (multiline format), use --cbify_ldf:
vw --cbify_ldf --cb_type mtr --squarecb --gamma_scale 500 -d cs_ldf_data.dat
The input uses VW's csoaa_ldf multiline format with a shared line and one line per action.
Converts regression examples into CB problems. Requires --min_value and --max_value to define the continuous range:
vw --cbify 8 --cbify_reg --min_value 0 --max_value 100 --loss_option 1 -d regression.dat
The continuous range is discretized into K bins (the --cbify argument). Three loss functions are available:
--loss_option |
Loss function | Formula |
|---|---|---|
| 0 (default) | Squared | (predicted - actual)^2 / range^2 |
| 1 | Absolute | ` |
| 2 | Zero-one | 0 if ` |
The zero-one loss threshold is controlled by --loss_01_ratio (default 0.1).
For truly continuous action spaces (instead of discretization), combine with --cats:
vw --cbify 4 --cbify_reg --min_value 185 --max_value 23959 --bandwidth 3000 -d regression.dat --loss_option 1
Use --cb_discrete to discretize the continuous space and route through cb_explore:
vw --cbify 2048 --cbify_reg --cb_discrete --min_value 185 --max_value 23959 -d regression.dat --loss_option 1
| Option | Description |
|---|---|
--cbify <K> |
Convert to CB with K actions |
--cbify_cs |
Accept cost-sensitive input instead of multiclass |
--cbify_reg |
Accept regression input |
--cbify_ldf |
Accept cost-sensitive LDF (multiline) input |
--loss0 <v> |
Cost for correct prediction (default 0.0) |
--loss1 <v> |
Cost for incorrect prediction (default 1.0) |
--flip_loss_sign |
Use reward (negate costs) instead of loss |
--min_value <v> |
Minimum value for regression mode |
--max_value <v> |
Maximum value for regression mode |
--loss_option <n> |
Regression loss: 0=squared, 1=absolute, 2=zero-one |
--loss_report <n> |
0=normalized loss, 1=denormalized |
--loss_01_ratio <v> |
Threshold ratio for zero-one loss (default 0.1) |
--cb_discrete |
Discretize continuous space for regression mode |
These combine with the standard CB exploration options (--epsilon, --first, --bag, --cover, --cb_explore_adf, --cb_type, --regcb, --squarecb, etc.) documented on the Contextual Bandit algorithms page.
- Home
- First Steps
- Input
- Command line arguments
- Model saving and loading
- Controlling VW's output
- Audit
- Algorithm details
- Awesome Vowpal Wabbit
- Learning algorithm
- Learning to Search subsystem
- Loss functions
- What is a learner?
- Docker image
- Model merging
- Evaluation of exploration algorithms
- Reductions
- Contextual Bandit algorithms
- Contextual Bandit Exploration with SquareCB
- Contextual Bandit Zeroth Order Optimization
- Conditional Contextual Bandit
- Slates
- CATS, CATS-pdf for Continuous Actions
- Automl
- Epsilon Decay
- CBify: Supervised to Contextual Bandit
- Warm starting contextual bandits
- Efficient Second Order Online Learning
- Latent Dirichlet Allocation
- VW Reductions Workflows
- Interaction Grounded Learning
- CB with Large Action Spaces
- CB with Graph Feedback
- FreeGrad
- Marginal
- Active Learning
- Eigen Memory Trees (EMT)
- Element-wise interaction
- Bindings
-
Examples and Demos
- Examples
- Logged Contextual Bandit example
- One Against All (oaa) multi class example
- Weighted All Pairs (wap) multi class example
- Cost Sensitive One Against All (csoaa) multi class example
- Multiclass classification
- Error Correcting Tournament (ect) multi class example
- Malicious URL example
- Daemon example
- Matrix factorization example
- Rcv1 example
- Truncated gradient descent example
- Scripts
- Implement your own joint prediction model
- Predicting probabilities
- murmur2 vs murmur3
- Weight vector
- Matching Label and Prediction Types Between Reductions
- Zhen's Presentation Slides on enhancements to vw
- EZExample Archive
- Design Documents
- Contribute: