Skip to content

NLP Results and CCT size #69

@markNZed

Description

@markNZed

Thanks for making transformers much more approachable! The down side of this may be stupid questions from beginners like me (still, I hope this is not one). In the NLP results the five different datasets had best accuracy with five different CCT models. The Transformer, ViT-Lite, and CVT models almost have accuracy inversely correlated with size. My "intuition" is that bigger models would be better (for example, LLM often give the best results). Maybe the small size of the datasets means larger models can't be trained as well. Maybe the embedding is not optimized for transformers. Could you please offer insight into this?

The CCT is an encoder architecture. Are there small transformers that demonstrates an encoder/decoder or decoder architecture? How would you expect a decoder implementation of CCT to perform in generative tasks?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions