In my application, the attention weights are centering on locations which are indicative of a subset of the classes. Therefore, while the algorithm performs well on this subset, it sometimes misclassifies on the other classes because the attention weights cause the obvious differences to be considered "residual".
Is there a documented way of restricting the attention weights to a certain value or index domain to enforce constraints on its focus? This question makes me think of NLP problems where frameworks commonly pair ML methodologies with a set of predetermined rules (usually defined with spacy).
Any thoughts? Thanks in advance.
In my application, the attention weights are centering on locations which are indicative of a subset of the classes. Therefore, while the algorithm performs well on this subset, it sometimes misclassifies on the other classes because the attention weights cause the obvious differences to be considered "residual".
Is there a documented way of restricting the attention weights to a certain value or index domain to enforce constraints on its focus? This question makes me think of NLP problems where frameworks commonly pair ML methodologies with a set of predetermined rules (usually defined with spacy).
Any thoughts? Thanks in advance.