Skip to content

GECToR 2.0.0

Choose a tag to compare

@damien2012eng damien2012eng released this 26 Oct 18:14
· 28 commits to master since this release
  1. Write predictor class that can do multiple applications of forward() in model to replicate gector's iterative error correction.
  2. Manually loading the model onto a cuda device should not be happening inside gec_model this is handled by allennlp's predictor base class
  3. Modify pretrained gec model archive to use parameters for updated token embedder, token indexer, model etc.
  4. Gector doesn't provide a model archive, they just provide the weights file. We should definitely make a model archive file.
  5. Make model archive file for use with allennlp's Predictor.from_path() method and allennlp predict command
  6. A config.json file will need to be written. These specify all the parameters used during training which we will need to extract from hardcoded values in Gec_model
  7. Write unit test verifying that modified gec model archive can be used to do error correction on plaintext sentences.
  8. Override json_to_instance behavior in predictor class so that gector's bespoke tokenization module is used.
  9. Make the output from the predictor match the output from gec_model isn't accounted for.
    • Not correcting short sequences (<4 tokens),
    • Adding gector's start tokens to the input before correcting and stopping the iterations if no corrections were made for the previous iteration.