Issue #8, convert hunspell output to spellchecker JSON format#9
Issue #8, convert hunspell output to spellchecker JSON format#9
Conversation
backend/spellchecker/huntojson.sh
Outdated
| done | ||
| shift $((OPTIND-1)) | ||
|
|
||
| if [ "$#" != 1 ] |
There was a problem hiding this comment.
any sense to surround 1 with double-quotes, to compare string and string?
fixed hunspell options to interpret input as latex file
backend/spellchecker/huntojson.sh
Outdated
| INFILE=$1 | ||
|
|
||
| JSON=$( | ||
| cat $INFILE | |
There was a problem hiding this comment.
Enquote $INFILE, otherwise you're going have troubles with spaces in the path
|
One interesting question is duplicate keys: This script will produce JSON with three equals keys. I suggest filtering duplicates with |
backend/spellchecker/huntojson.sh
Outdated
| options_number=split(split_string[2], options, ", "); | ||
|
|
||
| for (i = 1; i <= options_number; i++) | ||
| {print "\t\t\""options[i]"\","} |
There was a problem hiding this comment.
Comma after the last element of the array produces invalid JSON, at least from jq perspective.The same applies to the commma after the last entry
removed "pretty-printing efforts"
fixed it with an additional variable in awk
backend/spellchecker/huntojson.sh
Outdated
|
|
||
| echo $JSON | ||
| echo "$JSON" | | ||
| sed ':a;N;$!ba;s/,\n}/\n}/g' |
There was a problem hiding this comment.
This can probably be better solved by introducing a global variable which is empty in the beginning and "," after the first use. Use it like a prefix of the serialized array:
print $2 + PREFIX + [
There was a problem hiding this comment.
Heck, that's a really good point. Shame on me for such a dummy solution.
Merge branch 'tkt_8_convert_hunspell_to_json' of https://github.com/bardsoftware/papeeria into tkt_8_convert_hunspell_to_json
| } | ||
|
|
||
| message Suggestions { | ||
| string json = 1; |
There was a problem hiding this comment.
one of the purposes of protocol buffers is to provide typed interface for data exchange between servers. Please replace this json with typed interface, like map<string, Replacements> suggestions, use it in the spellchecker and remove your own json-serialization
| # note: these fields' names doesn't start with "_" since deleting of this | ||
| # object isn't so straightforward -- for some reason interpreter destroys | ||
| # these "private" fields before __del__ is invoked. | ||
| self.parser_lib_ = cdll.LoadLibrary("../libparser/build/libparser.so") |
There was a problem hiding this comment.
what's the purpose of the trailing underscores?
There was a problem hiding this comment.
also, please parameterize this class with the path to the shared library
There was a problem hiding this comment.
Oh, I thought I removed them.
Yep, that's what I forgot to do.
|
|
||
| class SpellcheckerServicer(spellchecker_pb2.SpellcheckServicer): | ||
| """ | ||
| gRPC service to check text that comes from stubs. |
There was a problem hiding this comment.
it is service client who sends you the text, not a stub. Stub is a purely technical thing which provides typed interface on the client side.
| message Suggestions { | ||
| string json = 1; | ||
| message Suggestion { | ||
| string key = 1; |
There was a problem hiding this comment.
you don't need the key anymore
|
[bard@bardtop3 build (tkt_8_convert_hunspell_to_json)]$ make |
|
@dbarashev meh, didn't realize that CMake generates such Makefiles, I'll remove the Makefile. Of course you want to use cmake. |
| type=str, | ||
| metavar="PATH", | ||
| help="path to .dic and .aff files") | ||
| required.add_argument("-L", "--language", |
There was a problem hiding this comment.
language is defined in the request, no?
There was a problem hiding this comment.
Request from client? Nope, in the request we ask spellchecker to use those languages and that demand is satisfyed only if spellchecker already has hunspell instance initialized with a dictionaries of that languages.
There was a problem hiding this comment.
which means that we're restricted to one language, since this argument is single-valued.
Why don't you initialize hunspell instance with the languagw requested bythe client on demand?
There was a problem hiding this comment.
Alrighty then, I'll work that out.
No description provided.