Skip to content

kohya_ss flux #5

@dimonnwc3

Description

@dimonnwc3

I was trying to fine-tune flux model with the sd3-flux.1 branch, by adding KOHYA_REF=sd3-flux.1 env variable.

Although the container starts, training fails immediately with the following error: Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory

I checked /usr/lib/x86_64-linux-gnu directory, where libnvrtc.so and libnvrtc.so.12 files are missing for some reason.

Then I tried to mount volume x86_64-linux-gnu from the host, by changing my docker-compose file:

volumes:
    - aidock_workspace_dev:/workspace
    - /usr/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu <- added this line

Fine tuning starts working, but during startup and training it still shows some errors:

1.

ERROR: ld.so: object 'libtcmalloc.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
This one, happens on startup and multiple times later.

2.

E0000 00:00:1731487428.843877    2536 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1731487428.847053    2536 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered

this one happens when fine-tuning starts

Seems my solution by mapping /usr/lib/x86_64-linux-gnu is not correct and there has to be another one to fix the original error: Could not load library libnvrtc.so.12. Error: libnvrtc.so.12: cannot open shared object file: No such file or directory.

Any ideas, how to make it work?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions