Python script to generate a .json.gz file per each locale#5
Conversation
dstillman
left a comment
There was a problem hiding this comment.
Let's also have this output the necessary lines to copy into .htaccess to map locale parameter values to the files between written. We mostly won't need to update them, but good to keep them coupled to the available locales, and the logic might need tweaking.
And for that logic, we basically want to implement this code:
…but entirely with rewrite rules. (I haven't totally though through how possible this is, but I think it's mostly doable by writing out a lot of rules in a given order. Let's see what we can do.)
en-US → en-US (exact match)
ar → ar (exact match)
de-DE → de (matching language part)
ca → ca-AD (matching language part)
en-NZ → en-US (prefer en-US over other available en locales for inexact match)
pt → pt-PT (prefer country code matching language code if unspecified — this one's debatable, and 217 million Brazilians would presumably disagree, but it's what we do elsewhere)
zh → zh-CN (this we just get from sorting the country codes, but it makes sense for our userbase)
zz → full file (ignore locale parameter if language code is unknown)
| schema_text = f.read() | ||
| schema = json.loads(schema_text) | ||
|
|
||
| if not os.path.exists('../locales'): |
There was a problem hiding this comment.
Script should be runnable from any folder. You're already getting the parent folder above, so you should just use that for other paths.
You should also wipe locales to remove any existing files. (Locales will pretty much never be removed, but just to fix possible bugs, etc.)
There was a problem hiding this comment.
Understood, this has been done.
| for creator_type in item_type['creatorTypes']: | ||
| if creator_type['creatorType'] in current_locale["itemTypes"]: | ||
| creator_type['creatorType'] = current_locale["itemTypes"][creator_type['creatorType']] | ||
| del schema_with_one_locale['locales'] |
There was a problem hiding this comment.
Definitely don't need all of this — sorry for not specifying. Can just keep the one locale keyed properly in locales, as below.
|
Added the generation of the .htaccess file. The example output is as below: Each of these rules matches |
| os.mkdir(locales_folder) | ||
|
|
||
| # String that accumulates the rules to paste into htaccess | ||
| htaccess_rules = f"RewriteRule ^schema/({'|'.join(schema['locales'].keys())})$ /zotero-schema/locales/$1.gz [L]" |
There was a problem hiding this comment.
It should be a locale query string parameter, not a path component.
| # Catch all for default schema with all locales | ||
| htaccess_rules += f'\nRewriteRule ^schema/* /zotero-schema/schema.json.gz [L]' | ||
|
|
||
| print("--- .htacess rules --- \n" + htaccess_rules + "\n--- ---") |
There was a problem hiding this comment.
Can skip the header and footer
|
Can replace update-gz with update-gz.py, no extension, mode 755 |
|
I think we'll need some additional conditions/rules for the locale gz files, like we already have for schema.json.gz: E.g., checking Let's just generate the whole block in this script. And we don't need newlines — better to keep this as a single block. |
|
Not sure it matters, but since we're generating this in a script anyway and know the total count, we can probably add a skip flag ( (Will this all be meaningfully faster than just having a single schema.php file that generate a locale-specific file and dumped it in memcached? Unclear. But this is certainly the more fun way to do it…) |
…rect content type, and filematch
|
After testing it out with a version of dataserver I ran locally, I had to add one more condition https://gist.github.com/abaevbog/23a986e6966000325f609652cd25e6ce |
|
I think we can do a slightly simpler block: Specifically:
|
|
Yes, you are right... |
| htaccess_rules = f'''RewriteCond %{{REQUEST_URI}} !^/(schema|zotero-schema) | ||
| RewriteRule ".?" "-" [S=LINES_TO_SKIP] | ||
| htaccess_rules = f'''RewriteCond %{{REQUEST_URI}} !^/schema | ||
| RewriteRule ".?" "-" [S=LINES_TO_SKIP,L] |
There was a problem hiding this comment.
L isn't right here, though — that breaks the dataserver completely by skipping the main redirect at the bottom of the file.
There was a problem hiding this comment.
I see what you mean - I removed it.
For some reason when I was testing it on my local dataserver setup, all non /schema requests reached index.php as they were supposed to even with that L flag. Probably can blame it on my local dataserver or apache setup
| RewriteCond %{{HTTP:Accept-Encoding}} !gzip | ||
| RewriteRule ^schema(/.*)?$ /zotero-schema/schema.json [QSD,L] | ||
| RewriteCond %{{QUERY_STRING}} (?:^|&)locale=({'|'.join(schema['locales'].keys())})(?:&|$) | ||
| RewriteRule ^schema(/.*)?$ /zotero-schema/locales/%1.json.gz [QSD] |
There was a problem hiding this comment.
Missing L here, so exact matches (e.g., locale=en-US) don't work
| # For every country code, sort locale candidates and add rule to htacecss | ||
| for country_code in htaccess_mapings.keys(): | ||
| htaccess_mapings[country_code].sort(key=locale_sort_key) | ||
| # Each rule is only applid is gzip encoding is accepted |
No description provided.