Description
We need to be able to transform token sequences like "seven hundred and sixty-five thousand, four hundred and thirty-two" to the "765432". There could be different handling of such tokens in different languages (for example Roman numerals deals with subtractions). So let's for now only focus on how English tokens transforming to numbers. Let's call this approach "general" (later we would define which approach should be used in languages.yaml
file)
Initial idea is to iterate through the list of tokens, skipping tokens that are in skip
, or [\W_]+
. Each token should be present in dictionary (numbers
section of the language).
So if number represented by current token is less then previous, we use addition, if it is greater than several of previous nearby numbers, than those smaller number are describing this bigger one and use multiplication. Be sure to use multiplication only with those preceding number that are 1) less then current 2) directly chained with current.
This approach should of course be properly tested.