-
Notifications
You must be signed in to change notification settings - Fork 0
Midi 82/vel transformer blogpost #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| {% algrtmImg MIDI-velocity-transformer/samples/9115-pred-untrained.png pianoroll 170px %} | ||
| {% algrtmAudio MIDI-velocity-transformer/samples/9115-pred-untrained.mp3 %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing files 🚨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be ok now 👍
|
I have added some more text about tokenization as well as conclusion, references and missing file from before. Everything should work now. I am not sure as to contact info at the end of the post, maybe you know how you would like for it to look like? |
| @@ -0,0 +1,266 @@ | |||
| --- | |||
| title: MIDI Velocity Prediction with Transformer | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| title: MIDI Velocity Prediction with Transformer | |
| title: Modelling dynamic expression in piano performance |
I think we shouldn't mention MIDI or velocity in the title, as it's to technical - my suggestions aims to provide broader goal for what we're doing. I think you can add "with Transformers" if you like
| MIDI velocity is a crucial element in music dynamics, determining the force with which a note is played, | ||
| which profoundly influences the emotional quality of music. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should try to explain the problem without referencing MIDI, and then you could introduce MIDI as a data structure related to this problem 🤔
| MIDI velocity is a crucial element in music dynamics, determining the force with which a note is played, | ||
| which profoundly influences the emotional quality of music. | ||
|
|
||
| If you were to take a sequence of notes and predict their velocities by an untrained model, this is what you wold have ended up with: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those samples could be a good demonstration of what velocity is and how it affects the music - you could say that the velocities were randomized here.
Saying it's generated by an untrained model is confusing here, because you haven't introduce the problem that the model has to solve yet.
| within quantized MIDI data. | ||
|
|
||
|
|
||
| ### Model Overview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need a model overview at all here. We can assume that the readers know what a transformer is, and we can include some references in the text.
What this and the next section is missing for me, is a description of how we convert the piano performance problem into a list of tokens problem.
| MIDI data describes notes by 5 features: | ||
| 1. Pitch - Represented as a number between 0 and 127 (or 21 to 108 for piano keys, reflecting the standard 88-key keyboard). | ||
| 2. Start - Indicates the moment a key is pressed, measured in seconds. | ||
| 3. End - Marks the second when the key is released. | ||
| 4. Duration - calculated as the time elapsed between the key's press and release. | ||
| 5. Velocity - ranging from 0 to 128, indicating the intensity of the key press. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List of points is not the best format to describe a data structure. One alternative would be to make a code snippet showing a "note" class design, and referring to piano performance as a list of notes:
class Note:
pitch: int
start: float
end: float
velocity: intYou can also use https://mermaid.live/ to make a class diagram, I think our blog should support it just like github:
classDiagram
class Note{
pitch: int
velocity: int
start: float
end: float
}
(this does not look great, but you could play around with it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it does support mermaid :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just updated master, and now it does 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥🔥🔥
| ### Model Architecture | ||
| {% algrtmImgBanner MIDI-velocity-transformer/transformer.png transformer%} | ||
| A transformer built as described in [Attention is all you need](https://arxiv.org/abs/1706.03762) paper was used for this task. | ||
| The important hyperparameters: | ||
| | hyperparameter | number | | ||
| | -------------- | :-----: | | ||
| | Number of layers in encoder and decoder | **6** | | ||
| | Nuber of heads in attention layers | **8** | | ||
| | Dimension of encoder and decoder outputs | **512** | | ||
| | Dimension of a hidden layer of position-wise fast-forward network from each layer of encoder and decoder | **2048** | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like I mentioned, we don't need to explain what transformers are. I think the only relevant information we should share about our architecture is the number of trainable parameters.
Also, I'm agains using diagrams from external sources, either make your own, or just link to the source :)
| Our training dataset comprised approximately 200 hours of musical data sourced from the | ||
| [roszcz/maestro-v1](https://huggingface.co/datasets/roszcz/maestro-v1) dataset, which includes 1276 pieces of classical music performed during piano competitions. Each musical piece was segmented into 128-note sequences, with a 64-note overlap between adjacent samples. These sequences were quantized, and each note was mapped to its corresponding index in the source and target vocabularies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not the owner of maestro, you should cite the original source: https://magenta.tensorflow.org/datasets/maestro
|
Mermaid does work, thanks :)), |
…/blog into MIDI-82/vel-transformer-blogpost
Initial version of a blog post without final sections.
I was unable to change width of a audio controller, so I can fit only two columns with samples in a row.
Let me know your thoughts and suggestions :))