Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
…t if we bash h,w key into an int or str
… add, classic vit weight loading for naflex model
…king loader based patch compatible RandomErasing for NaFlex mode.
…w. Remove subregion mode, not going to be worth it.
… embeds and 'aspect preserving mode' to Flex Embeds. Some more docstrings and typing.
… creating classic vits as naflex. Cleanup, improvements.
…ndling from train.py onwards. Add docstrings and type annotations (thanks Claude).
|
@stas-sl if you train/fine-tune with a diff patch size using basic interpolation as you say, yeah, I imagine it will be fine, if you train while resizing to different patch sizes using the simple interpolation and don't get crazy in the range of sizes covered, I expect it'd be robust to sizes in the range used (at inference time). But I haven't tried this extensively. However, using the simple resize on existing model weights yields pretty poor results compared to the PI method. Originally with the PI method I had based on the original JAX impl it was damned slow, however I completely redid it native torch tensors and a WAY faster basis vector computation and it runs quite nicely at train time, so that's why I decided to just support the PI mode. I was just testing this yesterday and the NaFlex pipeline appears to be working well when both randomizing sequence length AND patch size at train time, neat. |

Working:
use_naflex=Trueflag in create_model())Not tested / not completed: