Enable string conversion in EUC-JP.#1296
Conversation
|
@swift-ci Please test |
7ae3f7f to
b0a8981
Compare
|
@swift-ci Please test |
|
@parkera @jmschonfeld Thank you for reviewing. In response to your feedback, I've made some changes:
|
|
Please let me request reviews again as per changes above. |
parkera
left a comment
There was a problem hiding this comment.
I think this looks good to me, but also want @jmschonfeld to chime in.
jmschonfeld
left a comment
There was a problem hiding this comment.
LGTM, thanks for putting this together!
96de131 to
bc1656d
Compare
Background: EUC-JP is not supported by OSS CoreFoundation, while it is supported by macOS Foundation Framework. See swiftlang#1016 This commit resolves the issue by calling ICU API if necessary.
bc1656d to
844479e
Compare
|
@swift-ci Please test |
jmschonfeld
left a comment
There was a problem hiding this comment.
Thanks - that resolves the build failures, but I have a comment about one issue in the tests that I've discovered as well (this should hopefully be the last thing, thanks for bearing with me)
| XCTAssertNil(sushi.data(using: String._Encoding.japaneseEUC)) | ||
| XCTAssertEqual( | ||
| sushi.data(using: String._Encoding.japaneseEUC, allowLossyConversion: true), | ||
| "Sushi?".data(using: .utf8) |
There was a problem hiding this comment.
This test unfortunately fails in the FOUNDATION_FRAMEWORK configuration because Foundation.framework's implementation of this conversion replaces both UTF-16 scalars of the emoji with ? rather than replacing the entire grapheme cluster with a single ?. I suspect that maybe ICU is replacing only with a single ? - do you happen to know if that's intentional? If so, for now we might need to vary the expected value to be Sushi?? in FOUNDATION_FRAMEWORK vs. Sushi? in !FOUNDATION_FRAMEWORK to account for the discrepancy between Foundation.framework and ICU (assuming we think this result from ICU is indeed correct and not a bug in how we're calling ICU)
There was a problem hiding this comment.
Sorry for my overlooking again, but I'm not sure if we should implement conversion just like Foundation framework.
I mean the behavior of Foundation framework looks buggy because it doesn't seem to handle surrogate pairs correctly. (🍣 is non-BMP but consists of only one scalar U+1F363. Does that "buggy" behavior come from the historical reason that NSString has been represented as a sequence of UTF–16??)
Of course, we can imitate such behavior by replacing UCNV_FROM_U_CALLBACK_SUBSTITUTE with a custom callback function.
However, let me point out that current swift-foundation's implementation for .ascii is code-point-basis, not UTF-16-value-basis:
import Foundation
let sushiEmoji = "🍣"
print(sushiEmoji.data(using: .ascii, allowLossyConversion: true)!.count)
// -> Prints "1"Options...?
- To keep consistent with current (e.g.)
.asciiimplementation:- Divide the test case for
FOUNDATION_FRAMEWORKand!FOUNDATION_FRAMEWORK.
- Divide the test case for
- To make it compatible with Foundation framework's behavior:
- Implement a custom callback for
ucnv_setFromUCallBack.
- Implement a custom callback for
There was a problem hiding this comment.
Yeah I suspect this behavior comes from the fact that NSString is natively stored as UTF-16. For now, let's just divide the test case with a FOUNDATION_FRAMEWORK/!FOUNDATION_FRAMEWORK value for this expectation since I think enabling the conversion at all on non-Darwin is a good step even if the behavior is slightly different than Foundation.framework, and I can come back with a separate change to look into making this behavior the same across both platforms (while ensuring we don't break compatibility with existing clients).
…ersion. In response to: swiftlang#1296 (comment)
|
@jmschonfeld (I didn't know CI didn't execute test cases with |
|
@swift-ci Please test |
|
Thanks for all of your updates - looks great to me now! |
* Enable string conversion in EUC-JP. Background: EUC-JP is not supported by OSS CoreFoundation, while it is supported by macOS Foundation Framework. See swiftlang#1016 This commit resolves the issue by calling ICU API if necessary. * ICU: Omit encodings that should be supported by FoundationEssentials. In response to: swiftlang#1296 (comment) * ICU: Remove unnecessary `nonisolated(unsafe)` from static property. In response to: swiftlang#1296 (comment) * Add comment to `func _icuMakeStringFromBytes_impl`. In response to: swiftlang#1296 (comment) * Delegate string conversion to ICU only when encoding is EUC-JP. In response to: swiftlang#1296 (comment) * Replace dynamic `_icu*` functions only if `!FOUNDATION_FRAMEWORK`. In response to: swiftlang#1296 (comment) * Divide test cases depending on `FOUNDATION_FRAMEWORK` for EUC_JP conversion. In response to: swiftlang#1296 (comment)
Background: EUC-JP is not supported by OSS CoreFoundation, while it is supported by macOS Foundation Framework.
See #1016
This PR resolves the issue by calling ICU API if necessary.