Skip to content

Add spoken number and year conversion toggles#2

Merged
alasano merged 9 commits intomainfrom
feature/number-word-to-digits
Feb 26, 2026
Merged

Add spoken number and year conversion toggles#2
alasano merged 9 commits intomainfrom
feature/number-word-to-digits

Conversation

@alasano
Copy link
Owner

@alasano alasano commented Feb 26, 2026

Summary

  • port spoken number-word conversion from upstream work (originally from upstream PR Add option to convert spoken number words to digits kitlangton/Hex#155)
  • harden number parsing to avoid over-eager merges in ambiguous phrases (e.g. "one and two")
  • add optional spoken year conversion mode (e.g. "twenty twenty one" -> "2021")
  • add settings toggles for both number and year conversion in Transcript Modifications
  • add/expand tests for number and year conversion edge cases
  • add backfilled changeset for prior AI transforms and new changesets for this work

Notable behavior changes

  • one and two now converts to 1 and 2 (not 3)
  • point five converts to 0.5
  • one point converts to 1
  • twenty twenty one remains conservative under number mode (20 21) but converts to 2021 when year mode is enabled

Testing

  • cd HexCore && swift test (passes)

davaucl and others added 9 commits January 16, 2026 22:09
Adds a new toggle in Transcript Modifications that converts spoken cardinal numbers to numeric digits during transcription post-processing.

Examples:
- "twenty five" → "25"
- "one thousand three hundred thirty six" → "1336"
- "three point one four" → "3.14"

The conversion runs after word removals and before word remappings, allowing users to further customize the output.
- When in decimal mode, peek past whitespace to find digit tokens
- Accept valid decimals even when integer part is zero (e.g., 'zero point five' → '0.5')
- Fixes 'three point one four' → '3.14' (was stopping at whitespace after 'point')
- Fixes 'zero point five' → '0.5' (was rejecting due to total == 0)
Tighten NumberWordConverter parsing to avoid over-eager merges in natural
language text while preserving existing cardinal conversions.

Improvements:
- Restrict "and" connector behavior to scale contexts so phrases like
  "one and two" and "between one and two" convert to separate numbers
  instead of being summed.
- Support leading decimal forms ("point five" -> "0.5").
- Handle trailing decimal markers safely ("one point" -> "1").
- Prevent accidental token gluing by trimming consumed trailing whitespace.
- Keep adjacent tens conservative by default ("twenty twenty one" ->
  "20 21").

Tests:
- Add regression tests for ambiguous connector phrases and decimal edge cases.
- Add mixed-context tests (version/chapter/list punctuation).
- Add article/standalone scale and incomplete phrase coverage.

All HexCore tests pass after these changes.
Introduce a conservative year conversion pass (1900-2099) that can be
enabled independently from number-word conversion.

What this adds:
- New YearWordConverter that converts common spoken forms such as:
  - nineteen eighty four -> 1984
  - twenty twenty one -> 2021
  - twenty ten -> 2010
  - twenty oh five -> 2005
- New setting: convertSpokenYearsToDigits (default false).
- New settings UI toggle in Transcript Modifications -> Word Remappings,
  next to the existing number-word conversion toggle.
- Transcription and scratchpad preview pipeline now apply year conversion
  before number-word conversion when enabled.
- Dedicated YearWordConverter tests covering positive cases,
  ambiguity guards, punctuation/mixed text behavior.

Safety/behavior:
- Keeps conversion conservative and avoids short ambiguous phrases like
  'twenty one' to reduce false positives in general dictation text.

All HexCore tests pass.
- Backfill missing minor changeset for prior AI transforms (d53b16c)
- Add minor changeset for spoken year conversion mode
- Restore original number-word changeset summary for PR kitlangton#155 context
Resolve HexSettings conflicts by keeping both AI transform settings from main and spoken number/year conversion settings from this branch.
@alasano alasano merged commit a5d0ab5 into main Feb 26, 2026
@alasano alasano deleted the feature/number-word-to-digits branch February 26, 2026 03:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants