Does radial-attention support training? If it does, is there any training data? If it doesn't, is there a plan to support it?