What is the scope of "Anything"?

It is an interesting work and the task it aims to do is as exciting as SAM to me.
But I am not familar with audio research and I do have some questions related to this work.

Firstly, I checked the dataset amd it seems not very complete for "sound separation" or "separate anything in audio".
Actually I tried some samples for "separate vocal from songs", I found no matter use "Human Sounds" or "Vocal" the model cannot separate it even from a very slow and simple "guitar playing and singing" sample. And reversely I tried "acoustic guitar", it contains some vocal which is obvious.
Am I misunderstanding the scope that "songs" do not belong to music and the scope of this work?

Secondly, I would like to ask why it is foundation. It seems multimodal or multiple types of inputs = foundation model as I do not know what it provides for the "downstream tasks". Can someone provide me the insights?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the scope of "Anything"? #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the scope of "Anything"? #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions