A Julia package for simulating the behavior of new algorithms for solving multi-armed bandit problems, including the stochastic and contextual problems.
Context
MinimalContext
StochasticContext
Bandit
AdversarialBandit
DuelingBandits
ProbabilisticBandit
ContextualBandit
NonStationaryBandit
ChangePointBandits
MarkovianBandit
RestlessBandits
SleepingBandits
StochasticBandit
Learner
MLELearner
BetaLearner
Algorithm
RandomChoice
EpsilonGreedy
AnnealingEpsilonGreedy
DecreasingEpsilonGreedy
Softmax
AnnealingSoftmax
UCB1
UCB1Tuned
UCB2
UCBV
Exp3
ThompsonSampling
Hedge
MOSS
ReinforcementComparison
Pursuit
Game
StochasticGame
TO BE FILLED IN...
Here we simulate several algorithms for T = 5
trials. To get accurate
estimates of the default summary statistics generated by the simulate
function, we use 50_000
simulation runs per algorithm/bandit pair.
using Bandits, Distributions
T = 5
S = 50_000
l1 = MLELearner(0.5, 0.25)
l2 = BetaLearner()
algorithms = [
RandomChoice(l1),
EpsilonGreedy(l1, 0.1),
Softmax(l1, 0.1),
UCB1(l1),
ThompsonSampling(l2),
MOSS(l1),
]
bandits = [
StochasticBandit([Bernoulli(0.1), Bernoulli(0.2), Bernoulli(0.3)]),
]
simulate(algorithms, bandits, T, S)
We use the following abbreviations throughout our codebase:
- s: Current simulation
- S: Total number of simulations
- t: Current trial
- T: Total number of trials
- c: Context
- a: Index of an arm
- r: Reward
- g: Regret