Skip to content

empty db, curious errors, empty output, long gen time, outputs noise #32

@baardev

Description

@baardev

I am writing here because the discord invite in the README.md is invalid.

I am not sure I am doing this "right". Using the dataset provided on Google Drive and the prompt "violins playing Tchaikovsky", it takes 10 minutes on an RTX 4070Ti to generate tokens and create a 4-second clip of chaotic humming sounds, and when I make a 30 seconds clip, which takes over an hour to generate tokens, it creates a 3 meg file that sounds like car horns under water :/

Is there a preferred prompt to use with the test data? What sounds were sampled to make the test data?

When I tried to sample my own sounds, after 24 hours, the semantic encoding was less than 10% finished. It is "normal' that it should take 10 days to sample a clip?

Also, using the Google Drive data, and --model_config ./model/musiclm_large_small_context.json I get the errors...

`Some weights of RobertaModel were not initialized from the model checkpoint at roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

You are using a model of type mert_model to instantiate a model of type hubert. This is not supported for all configurations of models and can yield errors.

What are the correct settings for using the Google Drive data?

My current command is:

python scripts/infer_top_match.py \
    "violins playing Tchaikovsky" \
    --num_samples 4 \
    --num_top_matches 1 \
    --semantic_path   ./model/semantic.transformer.14000.pt \
    --coarse_path     ./model/coarse.transformer.18000.pt \
    --fine_path       ./model/fine.transformer.24000.pt \
    --rvq_path        ./model/clap.rvq.950_no_fusion.pt \
    --kmeans_path     ./model/kmeans_10s_no_fusion.joblib \
    --model_config    ./model/musiclm_large_small_context.json \
    --duration 4

I had to use the Goggle Drive because the code, while not generating any errors, generated a 0 byte preprocessed.db file in the semantic section, which caused errors in the generation section.

Is there a working example of this code somewhere with proper checkpoints?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions