-
-
Notifications
You must be signed in to change notification settings - Fork 61
Linkification Data files and tooling #961
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
TODO:
|
…mple test cases, match spec better
…p wikipedia pages), gather statistics on mismatches.
Plus fixed ICANN file
eebcf83 to
992ff07
Compare
markusicu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review of only the data files so far.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
markusicu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only skimmed chunks of the Java code.
unicodetools/src/main/resources/org/unicode/tools/LinkDetectionTestSourceICANN.txt
Show resolved
Hide resolved
unicodetools/src/main/resources/org/unicode/tools/testUrlsStats.txt
Outdated
Show resolved
Hide resolved
unicodetools/src/main/resources/org/unicode/tools/tlds-alpha-by-domain.txt
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
Outdated
Show resolved
Hide resolved
unicodetools/src/main/java/org/unicode/utilities/LinkUtilities.java
Outdated
Show resolved
Hide resolved
|
Added a note about the + in queries in minimal escaping: |
markusicu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more comment for now, and there is at least one of the previous comment without discussion. I am also going over your discussion doc.
… as per email. Also cleaned up the code for the headers, to be more uniform. Also allow the wikipedia URL with ; in it.
| # Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. | ||
| # For terms of use and license, see https://www.unicode.org/terms_of_use.html | ||
| # | ||
| # |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for later: These files have trailing spaces now on empty comment lines, from the JOIN_N_HASH in the generator. We should clean that up at some point.
https://github.com/unicode-org/properties/issues/507
See also https://github.com/unicode-org/unicode-reports/pull/247
Code to generate the property files and test data for UTS 58
See also the related spec changes in https://github.com/unicode-org/unicode-reports/pull/247