Skip to content
This repository was archived by the owner on Feb 28, 2025. It is now read-only.
This repository was archived by the owner on Feb 28, 2025. It is now read-only.

Gpu training controller fails to upload model #1821

Open
@alexandreLamarre

Description

@alexandreLamarre
2023-11-02 19:17:13,291 - INFO - Model finished training predicting 3 logs correctly out of 3 total logs for an accuracy of 1.0 on eval dataset.
2023-11-02 19:17:23,761 - ERROR - OpniLog model was not able to be trained. Failed to upload output/nulog_model_latest.pt to opni-nulog-models/nulog_model_latest.pt: An error occurred (InternalError) when calling the UploadPart operation (reached max retries: 4): We encountered an internal error, please try again.

which will cause the deadlock in #1815

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions