Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time starts without stopping #2261

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

Excelsior7
Copy link
Contributor

If we call train_ch11 or train_concise_ch11 and the code never fits in the condition (like calling these functions with a batch size of 1500 and only one epoch), no time interval will be stored in the times array of the Timer object and this causes a division by zero in Timer.avg() because this function divides by the length of the times array.

If we call train_ch11 or train_concise_ch11 and the code never fits in the condition (like calling these functions with a batch size of 1500 and only one epoch), no time interval will be stored in the times array of the Timer object and this causes a division by zero in Timer.avg() because this function divides by the length of the times array.
Copy link
Member

@AnirudhDagar AnirudhDagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Excelsior7 that makes sense for the timer, but i think it will still fail in that hyperparam config because animator will not record anything. Since n%200 will not be 0 so the if condition is never run. please update the PR to address that as well, so that we can merge this together.

Code proposal number 1 to address timer and animator issues. At the same time, I propose some code layouts in order to standardize them across the implementations of the train function (train_ch11, train_concise_ch11) across the frameworks (pytorch, mxnet, tensorflow).
@Excelsior7
Copy link
Contributor Author

Excelsior7 commented Aug 25, 2022

To begin with, I would like to specify the principles from which I have developed this code:

  1. Minimize the number of modifications.
  2. Favour (simplicity, good solution) to (complexity, optimal solution). This follows logically from point 1.
  3. Try as much as possible to follow the path already paved by the code except if a change is deemed necessary or more suitable.

Even though the culture of code brings us together, and the above points may seem like common sense, I think they are important points to formalize to make sure I'm not constantly outside a framework of good practices around project management I'm not aware of (since I am a new contributor and I don't have much experience to rely on) or to help the reader answer some of the "why's" they may be asking as they skim through the code or my explanations of it. And if you have any suggestions or comments, I'd be happy to hear them.

Now the goal will be to explain the choices behind the modifications I made. To do so, I will copy and paste below the train_ch11() function of the pytorch framework and comment each modified line. The comments will apply to the other implementations.

Each block with the following shape indicates a change:
#----#
#COMMENTS ON THE MODIFICATION OF THE FOLLOWING LINE OF CODE.
#----#

#@save
def train_ch11(trainer_fn, states, hyperparams, data_iter, feature_dim, num_epochs=2):

# Initialization
w = torch.normal(mean=0.0, std=0.01, size=(feature_dim, 1), requires_grad=True)
b = torch.zeros((1), requires_grad=True)
net, loss = lambda X: d2l.linreg(X, w, b), d2l.squared_loss

#----#
# xlim[1, num_epochs+1]: Previously the code was designed in such a way that the results of the loss function 
#                         during the running of the epoch number #epoch were displayed by the animator 
#                         between #epoch - 1 and  #epoch on the x axis. However, it would seem more correct 
#                         to me to shift this display by 1 on the right in order to keep a chronological order 
#                         consistent with the general way of enumerating the epochs by the natural numbers 
#                         without zero.
#                                         
# ylim[0.22, 0.45]: I have modified the condition that allows to add to the animator and this condition is always satisfied if        
#                   i = 0. I observed that for some hyperparameters (e.g. train_sgd(0.005,1)), if i = 0 and the code is in its first 
#                   epoch, the result of the loss function is a bit higher than 0.35.
#----#
animator = d2l.Animator(xlabel='epoch', ylabel='loss',
                        xlim=[1, num_epochs+1], ylim=[0.22, 0.45])

#----#
# len_data_iter: I found it convenient to keep this value in memory because firstly the condition that allows to add to the 
#                animator constantly evaluates the length of data_iter (so if batch_size=1 it will evaluate 1500 times the 
#                length of an array of size 1500) and secondly it is useful to standardize the code for the tensorflow 
#                framework.
#----#
len_data_iter = len(data_iter);

#----#
# len_data_iter = | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16|...
#                          __________________________________________
#             k = | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 | 4 |...
#
# k: The goal behind k and the condition (which I simultaneously considered with k) 
#    that allows adding to the animator was to display at most 5 times per epoch.
#    So for each epoch if index i goes from 0 to len_data_iter - 1 then i % k == 0 is true at most 5 times.
#----#
k, timer = np.floor((len_data_iter-1)/5) + 1 , d2l.Timer()

#----#
# Adding epoch helps me keep track of the current epoch when adding to the animator.
#----#
# Train
for epoch in range(num_epochs):
    #----#
    # Keeping track of the i index by adding the enumeration helps me enforce the goal of displaying at most 5 times.
    #----#
    for i, (X, y) in enumerate(data_iter):

        l = loss(net(X), y).mean()
        l.backward()
        trainer_fn([w, b], states, hyperparams)

         #----#
         # Why adding the condition i+1 == len_data_iter?
         # This allows to enter the condition as far as possible in the course of an epoch in order to either minimize the gap 
         # between the last value of the current epoch and the first value of the next epoch or, if there is no next epoch, to 
         # complete the animator's display. To give an example on the last point, if batch_size=1 then k = 300, and if 
         # num_epochs=1, the last value of index i for which i % 300 == 0 is i = 1200.                                                                
         # So epoch+1+i/1500 = 0+1+1200/1500 = 1.8.
         # Conclusion, the animator will display the loss from 1 to 1.8 on the x axis, without going as far as 2. And so the 
         # addition of this condition ensures me the display as far as epoch+1+i/1500 = 0+1+1499/1500 = 1.999.
         #
         # A by-product of adding this condition is that now the number of displays per epoch will be at most 6, the case above 
         # is an example.
         #----#
        if i % k == 0 or i+1 == len_data_iter:
            timer.stop()
            animator.add(epoch+1 + i/len_data_iter,
                         (d2l.evaluate_loss(net, data_iter, loss),))
            timer.start()

#----#
# The new implementation does not force me to close the time interval at the end of the training phase. Moreover it would 
# have forced me to return timer.cumsum()[:-1] to avoid an unnecessary conflict later when displaying the course of the four 
# experiments. In d2l.plot(*list(map(list, zip(gd_res, sgd_res, mini1_res, mini2_res)),...).
#
# An additional benefit of the condition i+1 == len_data_iter helps us here to make sure that we have a cumulative time as 
# close as possible to the real execution time of the training phase. Because this condition will be satisfied just before the 
# end of the training phase.
#----#
DELETE timer.stop();

print(f'loss: {animator.Y[0][-1]:.3f}, {timer.avg():.3f} sec/epoch')

return timer.cumsum(), animator.Y[0]

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

@d2l-bot
Copy link
Member

d2l-bot commented Aug 25, 2022

Job d2l-en/PR-2261/2 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2261/

@d2l-bot
Copy link
Member

d2l-bot commented Aug 29, 2022

Job d2l-en/PR-2261/3 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2261/

@d2l-bot
Copy link
Member

d2l-bot commented Aug 29, 2022

Job d2l-en/PR-2261/4 is complete.
Check the results at http://preview.d2l.ai/d2l-en/PR-2261/

@AnirudhDagar
Copy link
Member

Hi @Excelsior7! Thanks for the very detailed explanation of the code changes. But I personally feel it makes the code tad too complex. The code looks correct to me and should fix the issue you raised but that would definitely make it a bit harder to understand.

I'll leave this to @astonzhang, please share what you feel. Do you think it is necessary to fix the corner case? Or should we just keep it as is?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants