Skip to content

Proposal: Autoencoder Architecture #41

@xshley

Description

@xshley

Hi! 👋 I wasn’t quite sure where to make this suggestion and wasn’t sure whether I should’ve made an issue on the GitHub. I’ve been thinking about this for the past few days and I feel like I’ve reasoned to myself enough to actually suggest this.

TL;DR
Modify Mapperatorinator's model architecture to work as a transformer based autoencoder, where beatmaps + audio are encoded into a shared "latent space". I believe this would allow semantic manipulation of beatmaps via vector arithmetic in that latent space. eg. Combining maps, modifying difficulty (/style/etc.) or improving the "quality" of beatmaps.

Right now, as far as I'm aware (correct me if I'm wrong) Mapperatorinator generates beatmaps from inputs like audio and metadata; However, I believe it still has very limited controllability and understanding of the abstract attributes it is provided.

I believe that a latent space can act as a sort of semantic embedding space for beatmaps. Once that space is learned by the autoencoder we can blend maps (map A + map B), shift a maps style or difficulty in that space, enhance quality by moving it in that space, interpolate between maps. Essentially, create vector editing tools that operate on these latent embeddings instead of descriptors.

I want to propose an architecture ( I'm not entirely sure how it would be structured but this is from what I've thought of ). An encoder that takes in a beatmap + audio + metadata features, it should output a latent embedding. A decoder that attempts to reconstruct the input beatmap from that latent embedding, it should take in the same audio features as the encoder as input as well as to not lose any information about the audio. A "Latent Generator" (another transformer) to generate latent embeddings based on the audio embeddings plus any other optional conditionings that are found useful - it can share the decoder from the autoencoder (which will construct the new beatmap from this latent embedding)

I had to reason to myself why the latent space might be better and heres some reasons I came up with. Our beatmaps become vectors in a meaningful space that represents the core ideas, allowing us to add a style [this also should apply to everything else not just style] latent + [style vector], do style transfer latent - [style a] + [style b], or interpolate in that space alpha * latent A + (1 - alpha) * latent B. We can find these vectors for things like style by averaging the deltas between paired groups (eg. 'bad' vs 'good'), and allows generalised "map editing" without retraining the entire model.

Some of the use cases I can think of off the top of my head is: Style Transfer, Difficulty scaling, Hybrid mapping (adding mappers styles together), Quality Enhancement + More.

Latent vectors should be regularized because we want to make sure that space is smooth & continuous.

I'm not quite sure how you would go about testing how structured that latent space actually is, and all the methods you would need to use to disentangle and interpret the space to find these editing functions. I'm also not completely sure whether this will actually result in beatmaps being semantically composable (in terms of adding two beatmaps actually giving a meaningful resulting "hybrid map").

I didn't quite know where to suggest this and I would've tried to implement this myself but I'm still learning ML and don't really have the experience to make this. I wanted to share this idea though because I believe its a direction that could be worth exploring and could make Mapperatorinator better.

I was partially inspired by an existing model I found doing this with text: https://huggingface.co/thesephist/contra-bottleneck-t5-base-wikipedia

I'm not sure if this is worth exploring, but I'm thankful you read this if you did.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions