Suite of neural networks used as pretraining for font and query contrastive learning. Started as an experiment to produce uppercase versions of lowercase glpyhs, taking inspiration from Tom7's lowercasing work. This iteration uses a U-Net architecture operating on pre-rendered bitmap glyphs of the fonts. Initial iterations have used ~3,000 fonts, with strong accuracy on the pretraining tasks, and showing reasonable transferability.
Running this project is only a matter of cloning it, creating a python environment, and installing the packages in requirements.txt. I recommend fiddling around with draw.py for fun first.
To train a model, you first need to acquire fonts to train it on. I am in no position to manage copyright for literal thousands of fonts, so you will need to find your own fonts. I recommend just using the fonts available on your system as a starting place. These should be placed in data/fonts. From here, you can run pretrain a model by first running vis.py (to visualize training), then pretrain.ipynb. Early stopping can be done by stopping the execution of the cell in the notebook, this will automatically create a checkpoint. Hopefully the config files are self-explanatory enough if you want to adjust training. Running draw.py currently uses a pretrained uppercasing model, but can be changed quickly in the testPath at the top of the file.
Part of this project consisted of evaluating the pretrained models' ability to transfer learning to a contrastive learning task (which is treated as regression) using measures like LogME, TransRate and H-Alpha. I also compare to two poor models, including one which outputs random gaussian noise and another which outputs only 0s. You can perform this yourself in estimation.py, but here are the results from the models I pretrained.
| Lowercase | Uppercase | Masked Autoencoder | CLIP (Image) | Random | Zeroes | |
|---|---|---|---|---|---|---|
| CLIP (Text) | -0.621 | -0.619 | -0.641 | -0.647 | -1.118 | null |
| BERT | 1.601 | 1.605 | 1.602 | 1.570 | -0.628 | null |
| Lowercase | Uppercase | Masked Autoencoder | CLIP (Image) | Random | Zeroes | |
|---|---|---|---|---|---|---|
| CLIP (Text) | 4.679 | 5.308 | 5.832 | 9.220 | 5.933 | 0.0 |
| BERT | 6.327 | 9.193 | 7.798 | 15.86 | 17.89 | 0.0 |
| Lowercase | Uppercase | Masked Autoencoder | CLIP (Image) | Random | Zeroes | |
|---|---|---|---|---|---|---|
| CLIP (Text) | 1.671 | 1.750 | 1.582 | 1.463 | 0.0 | 0.506 |
| BERT | 1.048 | 1.007 | 0.887 | 0.689 | 0.0 | 0.506 |
- Scores shown select the best layer from applicable models, CLIP only features the pooled output
- Contrastive tasks are not regression tasks, these are exploratory
- H-Alpha and TransRate require classification-style labels. Several methods were attempted for labeling, including clustering. All returned similar results
After creating a pretrained model, the scripts in finetune.ipynb can be run to train a set of contrastive models for querying fonts using a style-based text query. The tags for the fonts are taken from the Google Fonts repository, and the QueryData dataset is designed explicitly to work with the data in that repository. Any others will require rewrites. After training a text and image model, search.py offers a basic GUI for searching through fonts in the directory of your choosing.
Note: Glyph classifier was added to force differentiation between characters.
More font labels. The Google Fonts repo is big, but it only describes a total of ~3600 useable fonts.