Add spoken number and year conversion toggles#2
Merged
Conversation
Adds a new toggle in Transcript Modifications that converts spoken cardinal numbers to numeric digits during transcription post-processing. Examples: - "twenty five" → "25" - "one thousand three hundred thirty six" → "1336" - "three point one four" → "3.14" The conversion runs after word removals and before word remappings, allowing users to further customize the output.
- When in decimal mode, peek past whitespace to find digit tokens - Accept valid decimals even when integer part is zero (e.g., 'zero point five' → '0.5') - Fixes 'three point one four' → '3.14' (was stopping at whitespace after 'point') - Fixes 'zero point five' → '0.5' (was rejecting due to total == 0)
Tighten NumberWordConverter parsing to avoid over-eager merges in natural
language text while preserving existing cardinal conversions.
Improvements:
- Restrict "and" connector behavior to scale contexts so phrases like
"one and two" and "between one and two" convert to separate numbers
instead of being summed.
- Support leading decimal forms ("point five" -> "0.5").
- Handle trailing decimal markers safely ("one point" -> "1").
- Prevent accidental token gluing by trimming consumed trailing whitespace.
- Keep adjacent tens conservative by default ("twenty twenty one" ->
"20 21").
Tests:
- Add regression tests for ambiguous connector phrases and decimal edge cases.
- Add mixed-context tests (version/chapter/list punctuation).
- Add article/standalone scale and incomplete phrase coverage.
All HexCore tests pass after these changes.
Introduce a conservative year conversion pass (1900-2099) that can be enabled independently from number-word conversion. What this adds: - New YearWordConverter that converts common spoken forms such as: - nineteen eighty four -> 1984 - twenty twenty one -> 2021 - twenty ten -> 2010 - twenty oh five -> 2005 - New setting: convertSpokenYearsToDigits (default false). - New settings UI toggle in Transcript Modifications -> Word Remappings, next to the existing number-word conversion toggle. - Transcription and scratchpad preview pipeline now apply year conversion before number-word conversion when enabled. - Dedicated YearWordConverter tests covering positive cases, ambiguity guards, punctuation/mixed text behavior. Safety/behavior: - Keeps conversion conservative and avoids short ambiguous phrases like 'twenty one' to reduce false positives in general dictation text. All HexCore tests pass.
- Backfill missing minor changeset for prior AI transforms (d53b16c) - Add minor changeset for spoken year conversion mode - Restore original number-word changeset summary for PR kitlangton#155 context
Resolve HexSettings conflicts by keeping both AI transform settings from main and spoken number/year conversion settings from this branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Notable behavior changes
one and twonow converts to1 and 2(not3)point fiveconverts to0.5one pointconverts to1twenty twenty oneremains conservative under number mode (20 21) but converts to2021when year mode is enabledTesting
cd HexCore && swift test(passes)