[early WIP] Fix/rationalize loss-tallying by gojomo · Pull Request #2922 · piskvorky/gensim

gojomo · 2020-08-24T20:37:10Z

PR to eventually address loss-tallying issues: #2617, #2735, #2743. Early tinkering stage.

gojomo · 2020-09-03T23:28:17Z

Changes so far in Word2Vec:

using float64 for all loss tallying
resetting tally to 0.0 per epoch - but remembering history elsewhere for duration of current train() call
micro-tallying into a per-batch value rather than the global tally
then, adding to global tally rather than replacing it

Though the real goal is sensible loss-tallying across all classes, I think these small changes already remedy #2735 (float32 swallows large loss-values) & #2743 (worker losses clobber each other).

An oddity from looking at per-epoch loss across a full run: all my hs runs have shown increasing loss every epoch, which makes no sense to me. And yet, the models at the end have moved word-vectors to more useful places (thus passing our minimal sanity-tests). I don't think my small changes could have caused this oddity (but maybe); I suspect something pre-existing in HS-mode loss-tallying is the real reason. When I have a chance I'll compare to the loss patterns for similar-modes/similar-data in something like the Facebook FastText code, that also reports running loss.

gojomo · 2020-09-08T06:22:03Z

Training FB fasttext (HS, CBOW, no-ngrams ./fasttext cbow -verbose 5 -maxn 0 -bucket 0 -lr 0.025 -loss hs -thread 3 -input ~/Documents/Dev/gensim/enwik9 -output enwik9-cbow-nongrams-lr025-hs) shows decreasing loss reports through the course of training, as expected and unlike the strangely-increasing per-epoch loss our code (at least in this PR) reports. But, final results on a few quick most_similar ops seem very similar. So something remains odd about our loss reporting, especially in HS mode.

gojomo · 2020-09-08T20:01:45Z

As a point of comparison, Facebook's fasttext reports an "average loss", divided over some trial-count, like so:

(base) gojomo@Gobuntu-2020:~/Documents/Dev/fasttext/fastText-0.9.2$ time ./fasttext cbow -verbose 5 -maxn 0 -bucket 0 -lr 0.025 -loss hs -thread 3 -input ~/Documents/Dev/gensim/enwik9 -output enwik9-cbow-nongrams-lr025-hs
Read 142M words
Number of words:  847816
Number of labels: 0
Progress:  39.8% words/sec/thread:  431099 lr:  0.015052 avg.loss:  5.263475 ETA:   0h 5m31s
Progress:  45.4% words/sec/thread:  429306 lr:  0.013645 avg.loss:  4.725245 ETA:   0h 5m 1s
Progress:  58.6% words/sec/thread:  426932 lr:  0.010339 avg.loss:  3.865230 ETA:   0h 3m50s
Progress: 100.0% words/sec/thread:  422384 lr:  0.000000 avg.loss:  2.483185 ETA:   0h 0m 0s

Gensim should probably collect & report 2Vec-class training loss in a comparable way, so that numbers on algorithmically-analogous runs are broadly similar, for familiarity to users & as a cross-check of whatever it is we're doing.

piskvorky · 2020-09-08T21:25:10Z

+1 on matching FB's logic. What is "trial-count"? Is the average taken over words or something else?

gojomo · 2020-09-08T21:55:01Z

Unsure; their c++ (with a separate class for 'loss') is different enough from our code that I couldn't tell at-a-glance & will need to study it a bit more.

piskvorky · 2022-02-19T19:15:01Z

@gojomo cleaning up the loss-tallying logic still very much welcome. Did you figure out the "increasing loss" mystery?

We're planning to make a Gensim release soon – whether this PR gets in now or later, it will be a great addition.

gojomo · 2022-02-21T00:14:14Z

These changes would likely apply, & help a bit in Word2Vec, with just a little adaptation to current develop. I could take a look this week & wouldn't expect any complications.

But getting consistent loss-tallying working in Doc2Vec & FastText, & ensuring a similar calculation & roughly similar loss magnitudes with other libraries (mainly Facebook FastText), would require more, & hard-to-estimate, effort. We kind of need someone who both – (1) needs it; & (2) can get deep into understanding the code – to rationalize the whole thing.

Never figured out why our hs mode reports growing loss despite the model improving as expected on other checks.

gojomo force-pushed the loss-fixes branch 6 times, most recently from 8c61787 to 33ef202 Compare August 28, 2020 18:43

gojomo added 3 commits September 1, 2020 20:32

intensify cbow+hs tests; bulk testing method

7df38c3

loss: always tally; split to epoch_loss/minibatch_loss; use wider float

480e2f7

epoch_loss_history

a16824b

gojomo force-pushed the loss-fixes branch from 33ef202 to a16824b Compare September 2, 2020 03:39

gojomo mentioned this pull request Sep 8, 2020

[WIP] 2Vec SaveLoad improvements #2892

Closed

mpenkov marked this pull request as draft September 17, 2020 09:18

piskvorky mentioned this pull request Sep 24, 2020

[MRG] *2Vec SaveLoad improvements #2939

Merged

gojomo mentioned this pull request Oct 19, 2020

track training loss while using doc2vec issue. #2983

Open

gojomo mentioned this pull request Nov 29, 2022

Loss Word2Vec #3405

Closed

gojomo mentioned this pull request Sep 28, 2023

Word2vec: loss tally maxes at 134217728.0 due to float32 limited-precision #2735

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[early WIP] Fix/rationalize loss-tallying#2922

[early WIP] Fix/rationalize loss-tallying#2922
gojomo wants to merge 3 commits intopiskvorky:developfrom
gojomo:loss-fixes

gojomo commented Aug 24, 2020

Uh oh!

gojomo commented Sep 3, 2020

Uh oh!

gojomo commented Sep 8, 2020

Uh oh!

gojomo commented Sep 8, 2020

Uh oh!

piskvorky commented Sep 8, 2020 •

edited

Loading

Uh oh!

gojomo commented Sep 8, 2020

Uh oh!

piskvorky commented Feb 19, 2022 •

edited

Loading

Uh oh!

gojomo commented Feb 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

gojomo commented Aug 24, 2020

Uh oh!

gojomo commented Sep 3, 2020

Uh oh!

gojomo commented Sep 8, 2020

Uh oh!

gojomo commented Sep 8, 2020

Uh oh!

piskvorky commented Sep 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gojomo commented Sep 8, 2020

Uh oh!

piskvorky commented Feb 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gojomo commented Feb 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

piskvorky commented Sep 8, 2020 •

edited

Loading

piskvorky commented Feb 19, 2022 •

edited

Loading