The user is not informed of what the default mappings are. We could make this more clear, and perhaps also how to change the mapping.
Perhaps in plain text explain the current mapping at all times? E.g.:
X maps to panning and time. Y maps to pitch. Duration is fixed.