Dev diary: single-atom Beta power law

I was doodling and realized that while everything in the single-atom Ebisu case (v2 and v3) takes `pNow = p**(elapsed / t)` for an atom parameterized by `[a, b, t]` (where the probability recall at time `t` is assumed to be a `Beta(a, b)` random variable, that is, `probability recall at time t ~ Beta(a, b)`), there's nothing stopping us from changing thing.

The `p**elapsed` exponentiation is why our Beta random variable decays via an exponential, and we can very very easily get a single Beta to exhibit power-law forgetting by saying `pNow = p**log2(1 + elapsed / t)`. Both these expressions share some nice properties:
- both `p**(elapsed / t)` and `p**log2(1 + elapsed / t)` both are 0.5 when `t==elapsed` and `a=b`, i.e., `t` remains a halflife of the new expression
- both are 1.0 as `t` → 0 and asymptotically approach 0 as `t` grows very large.

The difference of course is that the power-law `p**log2(1 + elapsed / t)` decays muuuch slower than the exponential decay. It turns out that it's very easy to reuse the existing Ebisu v2 Beta+exponential library to do this power-law scheme, since basically `pNow = p**f(elapsed)`, i.e., the Beta random variable is raised to _some_ power—`elapsed` for exponential decay, `log2(elapsed...)` for power-law decay.

I have a little script that demonstrates this: https://github.com/fasiha/ebisu/blob/v3-release-candidate/scripts/betapowerlaw.py 

To run this,
1. create a venv or Conda env,
2. install dependencies: `python -m pip install numpy scipy pandas matplotlib tqdm ipython "git+https://github.com/fasiha/ebisu@v3-release-candidate"`,
3. then clone this repo and check out the release candidate `rc1` branch: `git clone https://github.com/fasiha/ebisu.git && cd ebisu && git fetch -a && git checkout v3-release-candidate`,
4. download my Anki reviews database: [collection-no-fields.anki2.zip](https://github.com/fasiha/ebisu/files/13405477/collection-no-fields.anki2.zip), unzip it, and place `collection-no-fields.anki2` in the `scripts` folder so the script can find it
5. start ipython: `ipython`
6. run the script: `%run scripts/betapowerlaw.py`. This will produce some text/figures

Now you can follow along:
```python
In [3]: predictRecall((2, 2, 10), 100) # THIS IS THE NEWLY DEFINED FUNCTION IN betapowerlaw.py
Out[3]: 0.17014120906719243

In [4]: ebisu2.predictRecall((2,2,10), 100, exact=True)
Out[4]: 0.03846153846153846

In [5]: predictRecall((2, 2, 10), 1000)
Out[5]: 0.07175073430740214

In [6]: ebisu2.predictRecall((2,2,10), 1000, exact=True)
Out[6]: 0.0005711022272986858
```
Above we compare the predicted recall 10 and 100 halflives:
- power law decay: 17% and 7% respectively
- exponential decay: 4% and 0.06% respectively

Running the script above will generate this chart comparing a few models for a few hundred quizzes in terms of log-likelihood:

![Four curves, single-atom Beta power-law algorithm](https://github.com/fasiha/ebisu/assets/37649/20e88fd6-209d-4403-9ddd-c19b4dda5944)

I have a very similar script for benchmarking the v3 ensemble-of-Betas algorithm, `%run scripts/analyzeHistory.py` will run this and generate this:

![Four curves, v3 ensemble algorithm](https://github.com/fasiha/ebisu/assets/37649/c46f8fdc-7f69-4f52-a808-c07b78ddc846)

In the two charts above, higher is better (higher likelihood). Each point corresponds to the sum of log-likelihoods (product of raw likelihoods) for each quiz for that flashcard. Each is sorted by the worst log-likelihood to the best, and the 125 right-most quizzes are flashcards for which I have no failures.

Looking at these side-by-side:
- the single-Beta power law algorithm is *pretty damn good*
- the Beta-ensemble is better though. Assuming the best model in both cases is close to optimal, the best v3-ensemble algorithm is 2-3 units of log-likelihood higher than the Beta-power-law algorithm's best scenario.

Both scripts also spit out a text file containing per-flashcard, per-quiz details of what likelihood each model assigned to the current quiz and its current halflife. Looking through these is really interesting because you can see how different models result in very different halflives after each quiz. This also emphasizes why benchmarking algorithms via log-likelihood (see https://github.com/fasiha/ebisu.js/issues/23) is tricky: an easy way to "cheat" is just be overly optimistic because in general failures are quite uncommon and the penalty an algorithm incurs by being very wrong about occasional failures is more than made up by the boost it gets by over-confidently predicting every quiz to be a success. This is really important: an algorithm/model that performs well in terms of sum-of-log-likelihoods doesn't mean it's the best, we have to look at how it handles failures, how it grows halflives after quizzes, if they're reasonable.

So right now I'm not sure what to do 😂 hence this dev diary—maybe writing things out will give me some ideas. I could try to see if there are better initial parameters that improve on these. I'm also going to investigate whether the halflives produced by the two algorithms are reasonable (since some apps will no doubt want to do the Anki thing and schedule reviews for when recall probability drops below a threshold).

If it turns out the single-atom Beta power law algorithm is good enough, should I scrap the Beta-ensemble model…? 😝!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev diary: single-atom Beta power law #64

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Dev diary: single-atom Beta power law #64

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions