Yoyodyne 🪀

Yoyodyne provides small-vocabulary sequence-to-sequence generation with and without feature conditioning.

These models are implemented using PyTorch and Lightning.

Philosophy

Yoyodyne is inspired by FairSeq (Ott et al. 2019) but differs on several key points of design:

It is for small-vocabulary sequence-to-sequence generation, and therefore includes no affordances for machine translation or language modeling. Because of this:
- The architectures provided are intended to be reasonably exhaustive.
- There is little need for data preprocessing; it works with TSV files.
It has support for using features to condition decoding, with architecture-specific code for handling feature information.
It supports the use of validation accuracy (not just loss) for model selection and early stopping.
Models are specified using YAML configuration files.
Releases are made regularly and bugs addressed.
It has exhaustive test suites.
🚧 UNDER CONSTRUCTION 🚧: It has performance benchmarks.

Authors

Yoyodyne was created by Adam Wiemerslage, Kyle Gorman, Travis M. Bartley, and other contributors like yourself.

Installation

Local installation

To install Yoyodyne and its dependencies, run the following command:

pip install .

Then, optionally install additional dependencies for developers and testers:

pip install -r requirements.txt

Google Colab

Yoyodyne is also compatible with Google Colab GPU runtimes.

Click "Runtime" > "Change runtime type".
Under the "Hardware accelerator", select a "GPU", then click "Save".
You may be prompted to delete the old runtime. Do so if you wish.
Then install and run using the ! as a prefix to shell commands.

File formats

YAML configuration files

Yoyodyne uses YAML for configuration files; see the example configuration files for examples.

Variable interpolation

Yoyodyne supports OmegaConf's variable interpolation syntax, which is useful to link hyperparameters, particularly to set the hyperparameters of source and/or features encoders in a way that is compatible with the outer-level model arguments for the decoder. For instance, if one wants to use the same hidden size for encoders and decoders one can simply set one value and then use variable interpolation for the others, as in the following configuration snippet:

...
model:
  init_args:
    ...
    decoder_hidden_size: 512
    source_encoder:
      init_args:
        hidden_size: ${model.decoder_hidden_size}
    features_encoder:
      init_args:
        hidden_size: ${model.decoder_hidden_size}
...

Custom variable resolvers

Occasionally one may wish to set one hyperparameter as some (non-identity) function of another. For example, if one is using a bidirectional RNN source encoder and a linear features encoder, the size of the latter's output size must be set to twice that of the source encoder's hidden size. For this, Yoyodyne registers the multiply custom resolver, as shown in the following snippet:

...
model:
  init_args:
    class_path: yoyodyne.models.SoftAttentionLSTMModel
    decoder_hidden_size: 512
    source_encoder:
      class_path: yoyodyne.models.modules.LSTMEncoder
      init_args:
        hidden_size: ${model.decoder_hidden_size}
    features_encoder:
      class_path: yoyodyne.models.modules.LinearEncoder
      init_args:
        hidden_size: ${multiply:${model.init_args.decoder_hidden_size}, 2}
...

Other custom resolvers can be registered in the main method if desired.

TSV data files

Yoyodyne operates on basic tab-separated values (TSV) data files. The user can specify source, features, and target columns, and separators used to parse them.

The default data format is a two-column TSV file in which the first column is the source string and the second the target string.

source   target

Feature columns

To enable the use of a features column, one specifies a (non-zero) data: features_col: argument, and optionally also a data: features_sep: argument (the default features separator is ";"). For instance, for the SIGMORPHON 2016 shared task data:

source   feat1,feat2,...    target

the format is specified as:

...
data:
  ...
  features_col: 2
  features_sep: ,
  target_col: 3
...

Alternatively, for the CoNLL-SIGMORPHON 2017 shared task, the first column is the source (a lemma), the second is the target (the inflection), and the third contains semi-colon delimited features strings:

source   target    feat1;feat2;...

the format is specified as simply:

...
data:
  ...
  features_col: 3
...

Reserved symbols

Yoyodyne reserves symbols of the form <...> for internal use. Feature-conditioned models also use [...] to avoid clashes between features symbols and source and target symbols, and in some cases, {...} to avoid clashes between source and target symbols. Therefore, users should not provide any symbols of the form <...>, [...], or {...}.

Usage

The yoyodyne command-line tool uses a subcommand interface, with four different modes. To see a full set of options available for each subcommand, use the --print_config flag. For example:

yoyodyne fit --print_config

will show all configuration options (and their default values) for the fit subcommand.

For more detailed examples, see the configs directory.

Training (`fit`)

In fit mode, one trains a Yoyodyne model, either from scratch or, optionally, resuming from a pre-existing checkpoint. Naturally, most configuration options need to be set at training time. E.g., it is not possible to switch modules after training a model.

This mode is invoked using the fit subcommand, like so.

yoyodyne fit --config path/to/config.yaml

Alternatively, one can resume training from a pre-existing checkpoint so long as it matches the specification of the configuration file.

yoyodyne fit --config path/to/config.yaml --ckpt_path path/to/checkpoint.ckpt

Seeding

Setting the seed_everything: argument to some fixed value ensures a reproducible experiment (modulo hardware non-determism).

Model architecture

A specification for a model includes specification of the overall architecture and for most models, a specification of the source encoder. One may also specify a separate features encoder or use model: features_encoder: true to indicate that the source and features encoders should share parameters.

Each model exposes its own hyperparameters; consult the example configuration files and model docstrings for more information.

The following are general-purpose models:

yoyodyne.models.SoftAttentionGRUModel: a GRU decoder with an attention mechanism; the initial hidden state is treated as a learned parameter. This is most commonly used with yoyodyne.models.modules.GRUEncoders.
yoyodyne.models.SoftAttentionLSTMModel: the same as yoyodyne.models.SoftAttentionGRUModel but with an LSTM decoder instead. This is most commonly used with yoyodyne.models.modules.LSTMEncoders.
yoyodyne.models.TransformerModel: a transformer decoder; sinusodial positional encodings and layer normalization are used. This is most commonly used with yoyodyne.models.modules.TransformerEncoders.
yoyodyne.models.CausalTransformerModel: a transformer decoder without separate encoder modules, also known as a prefix LM.

The following models are particularly appropriate for when source and target share symbols:

yoyodyne.models.PointerGeneratorGRUModel: a GRU decoder with a pointer-generator mechanism; the initial hidden state is treated as a learned parameter. This is most commonly used with yoyodyne.models.modules.GRUEncoders.
yoyodyne.models.PointerGeneratorLSTMModel: the same as yoyodyne.models.PointerGeneratorGRUModel but with an LSTM decoder instead. This is most commonly used with yoyodyne.models.modules.LSTMEncoders.
yoyodyne.models.PointerGeneratorTransformerModel: a transformer decoder with a pointer-generator mechanism. This is most commonly used with yoyodyne.models.modules.TransformerEncoders.

The following models are particularly appropriate for transductions which are largely monotonic:

yoyodyne.models.HardAttentionGRUModel: an GRU decoder which models generation as a Markov process. By default it assumes a non-monotonic progression over the source, but with model: enforce_monotonic: true the model is made to progress over each source character in linear order. By specifying model: attention_context: 1 (or larger values) one can widen the context window for state transitions. This is most commonly used with yoyodyne.models.modules.GRUEncoders.
yoyodyne.models.HardAttentionLSTMModel: the same as yoyodyne.models.HardAttentionGRUModel but with an LSTM decoder instead. This is most commonly used with yoyodyne.models.modules.LSTMEncoders.

The following models are also appropriate for transductions which are largely monotonic, but require additional precomputation with the maxwell library:

yoyodyne.models.TransducerGRU: a GRU decoder with a neural transducer mechanism trained with imitation learning. This is most commonly used with yoyodyne.models.modules.LSTMEncoders.
yoyodyne.models.TransducerLSTM: the same as yoyodyne.models.TransducerGRU but with an LSTM decoder instead. This is most commonly used with yoyodyne.models.modules.LSTMEncoders.

The following models are not recommended for most users; they generally perform poorly and are present only for historical or testing reasons:

yoyodyne.models.GRUModel: a GRU decoder which uses the last non-padding hidden state(s) of the encoder(s) in lieu of attention; the initial hidden state is treated as a learned parameter. This is most commonly used with yoyodyne.models.modules.GRUEncoders.
yoyodyne.models.LSTMModel: the same as yoyodyne.models.GRUModel but with an LSTM decoder instead. This is most commonly used with yoyodyne.models.modules.LSTMEncoders.

Positional encoding

In RNN (e.g., GRU and LSTM) models and modules, information is passed between adjacent source, features, and target symbols, providing a sort of inductive bias towards locality. In contrast, transformer models and modules are in some sense global, and any biases towards locality must be injected into the system via positional encoding systems.

For core transformer modules (including causal and pointer-generator variants), the user can specify the following positional encodings:

yoyodyne.models.modules.AbsolutePositionalEncoding: a trainable positional encoding scheme with a unique representation for each position $i$ in a sequence.
yoyodyne.models.modules.NullPositionalEncoding: this dummy module disables positional encoding; it has no parameters.
yoyodyne.models.modules.SinusodialPositionalEncoding: a parameter-free (i.e., non-trainable) positional encoding; this is the default for most modules.

The following snippet, for example, enables absolute positional encoding for the source encoder and decoder of an transformer model:

model:
  class_path: yoyodyne.models.TransformerModel
  init_args:
    source_encoder:
      class_path: yoyodyne.models.modules.TransformerEncoder
        init_args:
          positional_encoding: 
            class_path: yoyodyne.models.modules.AbsolutePositionalEncoding
    decoder_positional_encoding:
      class_path: yoyodyne.models.modules.AbsolutePositionalEncoding

There is one additional positional encoding option: there are variants of the core transformer models and modules which support rotary positional encoding (RoPE). RoPE is implemented as a variant form of multihead attention deep within the transformer model and cannot be selected using positional_encoding or decoder_positional_encoding arguments. Rather, it gives rise to the following models and modules:

yoyodyne.models.RotaryCausalTransformerModel
yoyodyne.models.RotaryPointerGeneratorTransformerModel
yoyodyne.models.RotaryTransformerModel
yoyodyne.models.modules.RotaryCausalTransformerDecoder
yoyodyne.models.modules.RotaryFeatureInvariantTransformerEncoder
yoyodyne.models.modules.RotaryPointerGeneratorTransformerDecoder
yoyodyne.models.modules.RotaryTransformerDecoder
yoyodyne.models.modules.RotaryTransformerEncoder

Mixing rotary and non-rotary positional encodings within a single model is not recommended.

Optimization

Yoyodyne requires an optimizer and an learning rate scheduler. The default optimizer is yoyodyne.optimizers.Adam, and the default scheduler is yoyodyne.schedulers.Dummy, which keeps learning rate fixed at its initial value and takes no explicit configuration arguments.

The following YAML snippet shows the use of the Adam optimizer with a non-default initial learning rate and the yoyodyne.schedulers.WarmupInverseSquareRoot LR scheduler:

...
model:
  ...
  optimizer:
    class_path: yoyodyne.optimizers.Adam
    init_args:
      lr: 1.0e-5
      beta2: 0.9
  scheduler:
    class_path: yoyodyne.schedulers.WarmupInverseSquareRoot
    init_args:
      warmup_epochs: 10
...

Checkpointing

The ModelCheckpoint is used to control the generation of checkpoint files. A sample YAML snippet is given below.

...
checkpoint:
  filename: "model-{epoch:03d}-{val_accuracy:.4f}"
  mode: max
  monitor: val_accuracy
  verbose: true
...

Alternatively, one can specify a checkpointing that minimizes validation loss, as follows.

...
checkpoint:
  filename: "model-{epoch:03d}-{val_loss:.4f}"
  mode: min
  monitor: val_loss
  verbose: true
...

A checkpoint config must be specified or Yoyodyne will not generate any checkpoints.

Callbacks

The user will likely want to configure additional callbacks. Some useful examples are given below.

The LearningRateMonitor callback records learning rates:

...
trainer:
  callbacks:
  - class_path: lightning.pytorch.callbacks.LearningRateMonitor
    init_args:
      logging_interval: epoch
...

The EarlyStopping callback enables early stopping based on a monitored quantity and a fixed patience:

...
trainer:
  callbacks:
  - class_path: lightning.pytorch.callbacks.EarlyStopping
    init_args:
      monitor: val_loss
      patience: 10
      verbose: true
...

Logging

By default, Yoyodyne performs some minimal logging to standard error and uses progress bars to keep track of progress during each epoch. However, one can enable additional logging faculties during training, using a similar syntax to the one we saw above for callbacks.

The CSVLogger is enabled by default, and logs all monitored quantities to a CSV file.

The WandbLogger works similarly to the CSVLogger, but sends the data to the third-party website Weights & Biases, where it can be used to generate charts or share artifacts:

...
trainer:
  logger:
  - class_path: lightning.pytorch.loggers.WandbLogger
    init_args:
      project: unit1
      save_dir: /Users/Shinji/models
...

Note that this functionality requires a working account with Weights & Biases.

Other options

Dropout probability and/or label smoothing are specified as arguments to the model and its encoders:

...
model:
  source_encoder:
    class_path: ...
    init_args: ...
      dropout: 0.5
  decoder_dropout: 0.5
  label_smoothing: 0.1
...

Batch size is specified using data: batch_size: ... and defaults to 32.

By default, the source and target vocabularies share embeddings so identical source and target symbols will have the same embedding. This can be disabled with data: tie_embeddings: false.

By default, training uses 32-bit precision. However, the trainer: precision: flag allows the user to perform training with half precision (16), or with mixed-precision formats like bf16-mixed if supported by the accelerator. This might reduce the size of the model and batches in memory, allowing one to use larger batches, or it may simply provide small speed-ups.

There are a number of ways to specify how long a model should train for. For example, the following YAML snippet specifies that training should run for 100 epochs or 6 wall-clock hours, whichever comes first:

...
trainer:
  max_epochs: 100
  max_time: 00:06:00:00
...

Validation (`validate`)

In validation mode, one runs the validation step over labeled validation data (specified as data: val: path/to/validation.tsv) using a previously trained checkpoint (--ckpt_path path/to/checkpoint.ckpt from the command line), recording loss and other statistics for the validation set. In practice this is mostly useful for debugging.

This mode is invoked using the validate subcommand, like so:

yoyodyne validate --config path/to/config.yaml --ckpt_path path/to/checkpoint.ckpt

Evaluation (`test`)

In test mode, one computes accuracy over held-out test data (specified as data: test: path/to/test.tsv) using a previously trained checkpoint (--ckpt_path path/to/checkpoint.ckpt from the command line); it differs from validation mode in that it uses the test file rather than the val file.

This mode is invoked using the test subcommand, like so:

yoyodyne test --config path/to/config.yaml --ckpt_path path/to/checkpoint.ckpt

Inference (`predict`)

In predict mode, a previously trained model checkpoint (--ckpt_path path/to/checkpoint.ckpt from the command line) is used to label an input file. One must also specify the path where the predictions will be written:

...
predict:
  path: path/to/predictions.txt
...

This mode is invoked using the predict subcommand, like so:

yoyodyne predict --config path/to/config.yaml --ckpt_path path/to/checkpoint.ckpt

Examples

The examples directory contains interesting examples, including:

concatenate provides sample code for concatenating source and features symbols à la Kann & Schütze (2016).
wandb_sweeps shows how to use Weights & Biases to run hyperparameter sweeps.

Related projects

Maxwell is used to learn a stochastic edit distance model for the transducer models.
Yoyodyne Pretrained provides a similar interface but uses large pre-trained models to initialize the encoder and decoder modules.

License

Yoyodyne is distributed under an Apache 2.0 license.

For developers

We welcome contributions using the fork-and-pull model.

Older versions

In addition to releases available via GitHub and PyPI, the 0.3.3 version is available as the legacy branch.

Design

Yoyodyne is beholden to the heavily object-oriented design of Lightning, and wherever possible uses Torch to keep computations on the user-selected accelerator. Furthermore, since it is developed at "low-intensity" by a geographically-dispersed team, consistency is particularly important. Some consistency decisions made thus far:

Abstract classes overrides are enforced using PEP 3119.

Models and modules

A model in Yoyodyne is a sequence-to-sequence architecture and inherits from yoyodyne.models.BaseModel. These models in turn consist of ("have-a") one or more encoders responsible for encoding the source (and features, where appropriate), and a decoder responsible for predicting the target sequence using the representation generated by the encoders. The encoders and decoder are themselves Torch modules.

The model is responsible for constructing the encoders and decoders. The model dictates the type of decoder. The model communicates with its modules by calling them as functions (which invokes their forward methods); however, in some cases it is also necessary for the model to call ancillary members or methods of its modules.

When features are present, models are responsible for fusing source and features encodings, and do so in a model-specific fashion. For example, ordinary RNNs and transformers concatenate source and features encodings on the length dimension (and thus require that the encodings be the same size), whereas hard attention and transducer models average across the features encoding across the length dimension and the concatenate the resulting tensor with the source encoding on the encoding dimension; by doing so they preserve the source length and make it impossible to attend directly to features symbols.

Decoding strategies

Beam search

Each model supports greedy decoding implemented via a greedy_decode method; many models (vanilla RNNs, pointer-generator RNNs and all transformers) support beam search during prediction (though not during training, validation, or testing) via a beam_decode method. Beam search decoding is enabled by setting beam_width to some value > 1; batch_size must also be set to 1.

...
data:
  ...
  batch_size: 1
  ...
model:
  class_path: yoyodyne.models.SoftAttentionLSTMModel
  init_args:
    ...
    beam_width: 5
    ...
prediction:
  path: /Users/Shinji/predictions.tsv
...

The resulting prediction files will be a 10-column TSV file consisting of the top 5 target hypotheses and their log-likelihoods (collated together), rather than single-file text files just containing the top hypothesis.

Teacher and student forcing

Some models can only be treated with teacher forcing, but others can also be trained with student forcing by setting model: teacher_forcing: false. When using student forcing with transformer models, one should set data: max_target_length: ... to a value appropriate for the data to avoid unnecessary attention computations, which are quadratic in the maximum target length.

Testing

The "units" of tests/yoyodyne_test.py are essentially small integration tests running through training, prediction, and evaluation.

There are two kinds of data sets here. "Toy" data sets consist of simple transductions over a small alphabet:

copy (i.e., repeat the input string twice)
identity
reverse
upper (i.e., map to uppercase)

These are configured to train for 20 epochs, training for no more than 2 minutes.

In contrast, the two "real" data sets target existing problems:

ice_g2p: Icelandic G2P data from the 2021 SIGMORPHON shared task
tur_inflection: Turkish inflection generation data from the CoNLl-SIGMORPHON 2017 shared task

These are instead configured to train for up to 50 epochs (with early stopping), training for no more than 10 minutes.

There are also a few tests which confirm that specific misconfigurations raise exceptions.

To run all tests, run the following:

pytest -vvv tests

Given this large number of units and the allotted amount of training time, which accounts for the vast majority of compute time, running the full set of tests could take as long as a few hours. Thus one may wish instead to specify a subset of tests using the -k flag. For example, to run all the "toy" tests, run the following:

pytest -vvv -k toy tests

Or, to just run the Icelandic G2P tests, run the following:

pytest -vvv -k g2p tests

Or, to just run the misconfiguration tests, run the following:

pytest -vvv -k misconfiguration tests

See the pytest documentation for more information on the test runner.

Releasing

Create a new branch. E.g., if you want to call this branch "release": git checkout -b release
Sync your fork's branch to the upstream master branch. E.g., if the upstream remote is called "upstream": git pull upstream master
Increment the version field in pyproject.toml.
Stage your changes: git add pyproject.toml.
Commit your changes: git commit -m "your commit message here"
Push your changes. E.g., if your branch is called "release": git push origin release
Submit a PR for your release and wait for it to be merged into master.
Tag the master branch's last commit. The tag should begin with v; e.g., if the new version is 3.1.4, the tag should be v3.1.4. This can be done:
- on GitHub itself: click the "Releases" or "Create a new release" link on the right-hand side of the Yoyodyne GitHub page) and follow the dialogues.
- from the command-line using git tag.
Build the new release: python -m build
Upload the result to PyPI: twine upload dist/*

References

Kann, K. and Schütze, H. 2016. Single-model encoder-decoder with explicit morphological representation for reinflection. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 555-560.

Ott, M., Edunov, S., Baevski, A., Fan, A., Gross, S., Ng, N., Grangier, D., and Auli, M. 2019. fairseq: a fast, extensible toolkit for sequence modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), pages 48-53.

(See also yoyodyne.bib for more work used during the development of this library.)

Name		Name	Last commit message	Last commit date
Latest commit History 1,311 Commits
.circleci		.circleci
configs		configs
examples		examples
tests		tests
yoyodyne		yoyodyne
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
yoyodyne.bib		yoyodyne.bib

Folders and files

Latest commit

History

Repository files navigation

Yoyodyne 🪀

Philosophy

Authors

Installation

Local installation

Google Colab

File formats

YAML configuration files

Variable interpolation

Custom variable resolvers

TSV data files

Feature columns

Reserved symbols

Usage

Training (fit)

Seeding

Model architecture

Positional encoding

Optimization

Checkpointing

Callbacks

Logging

Other options

Validation (validate)

Evaluation (test)

Inference (predict)

Examples

Related projects

License

For developers

Older versions

Design

Models and modules

Decoding strategies

Beam search

Teacher and student forcing

Testing

Releasing

References

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 41

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Training (`fit`)

Validation (`validate`)

Evaluation (`test`)

Inference (`predict`)

Packages