Skip to content

Conversation

@WojciechMat
Copy link

Initial version of a blog post without final sections.
I was unable to change width of a audio controller, so I can fit only two columns with samples in a row.
Let me know your thoughts and suggestions :))

Comment on lines 30 to 31
{% algrtmImg MIDI-velocity-transformer/samples/9115-pred-untrained.png pianoroll 170px %}
{% algrtmAudio MIDI-velocity-transformer/samples/9115-pred-untrained.mp3 %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing files 🚨

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be ok now 👍

@WojciechMat
Copy link
Author

I have added some more text about tokenization as well as conclusion, references and missing file from before. Everything should work now. I am not sure as to contact info at the end of the post, maybe you know how you would like for it to look like?
Would love to hear your thoughts on what more to include, what to change in the text, if I should include more clarifications or make the content more engaging let me know! 🔥

@@ -0,0 +1,266 @@
---
title: MIDI Velocity Prediction with Transformer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
title: MIDI Velocity Prediction with Transformer
title: Modelling dynamic expression in piano performance

I think we shouldn't mention MIDI or velocity in the title, as it's to technical - my suggestions aims to provide broader goal for what we're doing. I think you can add "with Transformers" if you like

Comment on lines 12 to 13
MIDI velocity is a crucial element in music dynamics, determining the force with which a note is played,
which profoundly influences the emotional quality of music.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should try to explain the problem without referencing MIDI, and then you could introduce MIDI as a data structure related to this problem 🤔

MIDI velocity is a crucial element in music dynamics, determining the force with which a note is played,
which profoundly influences the emotional quality of music.

If you were to take a sequence of notes and predict their velocities by an untrained model, this is what you wold have ended up with:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those samples could be a good demonstration of what velocity is and how it affects the music - you could say that the velocities were randomized here.

Saying it's generated by an untrained model is confusing here, because you haven't introduce the problem that the model has to solve yet.

within quantized MIDI data.


### Model Overview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need a model overview at all here. We can assume that the readers know what a transformer is, and we can include some references in the text.

What this and the next section is missing for me, is a description of how we convert the piano performance problem into a list of tokens problem.

Comment on lines 60 to 65
MIDI data describes notes by 5 features:
1. Pitch - Represented as a number between 0 and 127 (or 21 to 108 for piano keys, reflecting the standard 88-key keyboard).
2. Start - Indicates the moment a key is pressed, measured in seconds.
3. End - Marks the second when the key is released.
4. Duration - calculated as the time elapsed between the key's press and release.
5. Velocity - ranging from 0 to 128, indicating the intensity of the key press.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List of points is not the best format to describe a data structure. One alternative would be to make a code snippet showing a "note" class design, and referring to piano performance as a list of notes:

class Note:
    pitch: int
    start: float
    end: float
    velocity: int

You can also use https://mermaid.live/ to make a class diagram, I think our blog should support it just like github:

classDiagram
    class Note{
      pitch: int
      velocity: int
      start: float
      end: float
    }
Loading

(this does not look great, but you could play around with it)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it does support mermaid :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated master, and now it does 🎉

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥🔥🔥

Comment on lines 133 to 142
### Model Architecture
{% algrtmImgBanner MIDI-velocity-transformer/transformer.png transformer%}
A transformer built as described in [Attention is all you need](https://arxiv.org/abs/1706.03762) paper was used for this task.
The important hyperparameters:
| hyperparameter | number |
| -------------- | :-----: |
| Number of layers in encoder and decoder | **6** |
| Nuber of heads in attention layers | **8** |
| Dimension of encoder and decoder outputs | **512** |
| Dimension of a hidden layer of position-wise fast-forward network from each layer of encoder and decoder | **2048** |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like I mentioned, we don't need to explain what transformers are. I think the only relevant information we should share about our architecture is the number of trainable parameters.

Also, I'm agains using diagrams from external sources, either make your own, or just link to the source :)

Comment on lines 149 to 150
Our training dataset comprised approximately 200 hours of musical data sourced from the
[roszcz/maestro-v1](https://huggingface.co/datasets/roszcz/maestro-v1) dataset, which includes 1276 pieces of classical music performed during piano competitions. Each musical piece was segmented into 128-note sequences, with a 64-note overlap between adjacent samples. These sequences were quantized, and each note was mapped to its corresponding index in the source and target vocabularies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not the owner of maestro, you should cite the original source: https://magenta.tensorflow.org/datasets/maestro

@WojciechMat
Copy link
Author

Mermaid does work, thanks :)),
I've added notes dataframe representation, some code snippet, removed redundant info on transformer and it's architecture.
I invite you to read, suggest changes and modify the text too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants