Skip to content

To Dos #1

@devonzuegel

Description

@devonzuegel

To Do

Programming

  • implement IBM Model 1 on train/ folder (Zoe, Tuesday/Wednesday)
  • fix ascii-unicode-latin1 encoding issues (Devon, Tuesday night)
  • implement caching with pickle in transl_probs.pickle (Devon, Wednesday night)
  • multiple iterations
  • self.sp_word_indices dictionary instead of binary search
  • optimize algorithm ( @luttigdev )
    • just reset at each iteration or completely recreate? (applies to multiple data structures)
  • lowercase? DOESN'T MATTER
  • implement evaluation/run-through for dev/ folder through Bleu ASAP
  • Viterbi + nltk (parts of speech tagging » reordering Sp-Eng verbs for exmaple)
    • didn't help much :(
    • NOTE: can't tag single English words because not enough context
  • add english language model (single-word probabilities) (Zoe, Thursday)
    • bigrams
      • translating 2 words ??
  • decide new priorities once we get Bleu working
  • conjugations in Spanish indicate subject ("Tengo" == "Yo tengo") ... deal with this!

Questions

  • logs
  • COGNATES (not a good idea)

Misc

  • shouldn't remove commas from translation...
  • Shouldn't remove " and similar thingies
    • seems to remove spaces around it still

Report

  • all the things

Useful resources & links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions