time starts without stopping #2261

Excelsior7 · 2022-08-20T20:03:51Z

If we call train_ch11 or train_concise_ch11 and the code never fits in the condition (like calling these functions with a batch size of 1500 and only one epoch), no time interval will be stored in the times array of the Timer object and this causes a division by zero in Timer.avg() because this function divides by the length of the times array.

AnirudhDagar

Thanks @Excelsior7 that makes sense for the timer, but i think it will still fail in that hyperparam config because animator will not record anything. Since n%200 will not be 0 so the if condition is never run. please update the PR to address that as well, so that we can merge this together.

Code proposal number 1 to address timer and animator issues. At the same time, I propose some code layouts in order to standardize them across the implementations of the train function (train_ch11, train_concise_ch11) across the frameworks (pytorch, mxnet, tensorflow).

Excelsior7 · 2022-08-25T17:08:22Z

To begin with, I would like to specify the principles from which I have developed this code:

Minimize the number of modifications.
Favour (simplicity, good solution) to (complexity, optimal solution). This follows logically from point 1.
Try as much as possible to follow the path already paved by the code except if a change is deemed necessary or more suitable.

Even though the culture of code brings us together, and the above points may seem like common sense, I think they are important points to formalize to make sure I'm not constantly outside a framework of good practices around project management I'm not aware of (since I am a new contributor and I don't have much experience to rely on) or to help the reader answer some of the "why's" they may be asking as they skim through the code or my explanations of it. And if you have any suggestions or comments, I'd be happy to hear them.

Now the goal will be to explain the choices behind the modifications I made. To do so, I will copy and paste below the train_ch11() function of the pytorch framework and comment each modified line. The comments will apply to the other implementations.

Each block with the following shape indicates a change:
#----#
#COMMENTS ON THE MODIFICATION OF THE FOLLOWING LINE OF CODE.
#----#

#@save
def train_ch11(trainer_fn, states, hyperparams, data_iter, feature_dim, num_epochs=2):

# Initialization
w = torch.normal(mean=0.0, std=0.01, size=(feature_dim, 1), requires_grad=True)
b = torch.zeros((1), requires_grad=True)
net, loss = lambda X: d2l.linreg(X, w, b), d2l.squared_loss

#----#
# xlim[1, num_epochs+1]: Previously the code was designed in such a way that the results of the loss function 
#                         during the running of the epoch number #epoch were displayed by the animator 
#                         between #epoch - 1 and  #epoch on the x axis. However, it would seem more correct 
#                         to me to shift this display by 1 on the right in order to keep a chronological order 
#                         consistent with the general way of enumerating the epochs by the natural numbers 
#                         without zero.
#                                         
# ylim[0.22, 0.45]: I have modified the condition that allows to add to the animator and this condition is always satisfied if        
#                   i = 0. I observed that for some hyperparameters (e.g. train_sgd(0.005,1)), if i = 0 and the code is in its first 
#                   epoch, the result of the loss function is a bit higher than 0.35.
#----#
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
                        xlim=[1, num_epochs+1], ylim=[0.22, 0.45])

#----#
# len_data_iter: I found it convenient to keep this value in memory because firstly the condition that allows to add to the 
#                animator constantly evaluates the length of data_iter (so if batch_size=1 it will evaluate 1500 times the 
#                length of an array of size 1500) and secondly it is useful to standardize the code for the tensorflow 
#                framework.
#----#
len_data_iter = len(data_iter);

#----#
# len_data_iter = | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|...
#                          __________________________________________
#             k = | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 4 |...
#
# k: The goal behind k and the condition (which I simultaneously considered with k) 
#    that allows adding to the animator was to display at most 5 times per epoch.
#    So for each epoch if index i goes from 0 to len_data_iter - 1 then i % k == 0 is true at most 5 times.
#----#
k, timer = np.floor((len_data_iter-1)/5) + 1 , d2l.Timer()

#----#
# Adding epoch helps me keep track of the current epoch when adding to the animator.
#----#
# Train
for epoch in range(num_epochs):
    #----#
    # Keeping track of the i index by adding the enumeration helps me enforce the goal of displaying at most 5 times.
    #----#
    for i, (X, y) in enumerate(data_iter):

        l = loss(net(X), y).mean()
        l.backward()
        trainer_fn([w, b], states, hyperparams)

         #----#
         # Why adding the condition i+1 == len_data_iter?
         # This allows to enter the condition as far as possible in the course of an epoch in order to either minimize the gap 
         # between the last value of the current epoch and the first value of the next epoch or, if there is no next epoch, to 
         # complete the animator's display. To give an example on the last point, if batch_size=1 then k = 300, and if 
         # num_epochs=1, the last value of index i for which i % 300 == 0 is i = 1200.                                                                
         # So epoch+1+i/1500 = 0+1+1200/1500 = 1.8.
         # Conclusion, the animator will display the loss from 1 to 1.8 on the x axis, without going as far as 2. And so the 
         # addition of this condition ensures me the display as far as epoch+1+i/1500 = 0+1+1499/1500 = 1.999.
         #
         # A by-product of adding this condition is that now the number of displays per epoch will be at most 6, the case above 
         # is an example.
         #----#
        if i % k == 0 or i+1 == len_data_iter:
            timer.stop()
            animator.add(epoch+1 + i/len_data_iter,
                         (d2l.evaluate_loss(net, data_iter, loss),))
            timer.start()

#----#
# The new implementation does not force me to close the time interval at the end of the training phase. Moreover it would 
# have forced me to return timer.cumsum()[:-1] to avoid an unnecessary conflict later when displaying the course of the four 
# experiments. In d2l.plot(*list(map(list, zip(gd_res, sgd_res, mini1_res, mini2_res)),...).
#
# An additional benefit of the condition i+1 == len_data_iter helps us here to make sure that we have a cumulative time as 
# close as possible to the real execution time of the training phase. Because this condition will be satisfied just before the 
# end of the training phase.
#----#
DELETE timer.stop();

print(f'loss: {animator.Y[0][-1]:.3f}, {timer.avg():.3f} sec/epoch')

return timer.cumsum(), animator.Y[0]

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

d2l-bot · 2022-08-25T17:34:45Z

Job d2l-en/PR-2261/2 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2261/

d2l-bot · 2022-08-29T00:07:15Z

Job d2l-en/PR-2261/3 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2261/

d2l-bot · 2022-08-29T11:55:01Z

Job d2l-en/PR-2261/4 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2261/

AnirudhDagar · 2022-09-22T14:40:03Z

Hi @Excelsior7! Thanks for the very detailed explanation of the code changes. But I personally feel it makes the code tad too complex. The code looks correct to me and should fix the issue you raised but that would definitely make it a bit harder to understand.

I'll leave this to @astonzhang, please share what you feel. Do you think it is necessary to fix the corner case? Or should we just keep it as is?

astonzhang requested a review from AnirudhDagar August 24, 2022 06:07

AnirudhDagar requested changes Aug 24, 2022

View reviewed changes

Excelsior7 requested a review from AnirudhDagar September 22, 2022 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time starts without stopping #2261

time starts without stopping #2261

Excelsior7 commented Aug 20, 2022

AnirudhDagar left a comment

Excelsior7 commented Aug 25, 2022 •

edited

Loading

d2l-bot commented Aug 25, 2022

d2l-bot commented Aug 29, 2022

d2l-bot commented Aug 29, 2022

AnirudhDagar commented Sep 22, 2022

time starts without stopping #2261

Are you sure you want to change the base?

time starts without stopping #2261

Conversation

Excelsior7 commented Aug 20, 2022

AnirudhDagar left a comment

Choose a reason for hiding this comment

Excelsior7 commented Aug 25, 2022 • edited Loading

d2l-bot commented Aug 25, 2022

d2l-bot commented Aug 29, 2022

d2l-bot commented Aug 29, 2022

AnirudhDagar commented Sep 22, 2022

Excelsior7 commented Aug 25, 2022 •

edited

Loading