This repository was archived by the owner on Feb 28, 2025. It is now read-only.
This repository was archived by the owner on Feb 28, 2025. It is now read-only.
Gpu training controller fails to upload model #1821
Open
Description
2023-11-02 19:17:13,291 - INFO - Model finished training predicting 3 logs correctly out of 3 total logs for an accuracy of 1.0 on eval dataset.
2023-11-02 19:17:23,761 - ERROR - OpniLog model was not able to be trained. Failed to upload output/nulog_model_latest.pt to opni-nulog-models/nulog_model_latest.pt: An error occurred (InternalError) when calling the UploadPart operation (reached max retries: 4): We encountered an internal error, please try again.
which will cause the deadlock in #1815