Skip to content

Conversation

Ian-Grant
Copy link

  • Added UTF8 Unit in mosmllib

  • Modified PP.sml to correctly format multi-octed UTF8 strings (it needs to parse the UTF8 representations to do this)

  • Changed Lexer.lex to check UTF8 encodings in string literals and allow full ISO/IEC 1064 UCS encodings to be used in numerical escapes:

    \U+XXXXXX

    as well as

    \uXXXX

    as specified in the Standard ML Definition.

UTF8 checking within character strings is non-standard compiler behavior and the switch Meta.utf8 is provided to switch this checking on. Thinking about it now, the extended syntax for numeric character literals should probably be conditional on that too.

Some of the logic came from the HOL Theorem prover, but they are doing it differently now, see: src/portableML/UTF8.sml in https://github.com/HOL-Theorem-Prover/HOL

  - Added UTF8 Unit in mosmllib
  - Changed Lexer.lex to check UTF8 encodings in string literals and allow
    full ISO/IEC 1064 UCS encodings to be used in numerical escapes:

      \U+XXXXXX

    as well as

      \uXXXX

    as specified in the Standard ML Definition.

    UTF8 checking within character strings is non-standard compiler behavior
    and the switch Meta.utf8 is provided to switch this checking on. Thinking
    about it now, the extended syntax for numeric character literals should
    probably be conditional on that too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant