-
Couldn't load subscription status.
- Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Goal
Introduce a Continuous Integration (CI) pipeline that automatically validates the quality, formatting, and structural consistency of the dictionary entries (within the translations.toml) upon every code change. This ensures data integrity and prevents common data entry errors.
Implementation Details (Tasks)
The CI pipeline must include an automated script (e.g., a Python or JavaScript script run by GitHub Actions) to perform the following checks:
1. Data Structure and Completeness Checks
- Field Presence: Check if all expected fields (keys) are present in every dictionary entry. The fields every entry must have are as follows:
en,uz,part_of_speech,description,pronunciation_uz,similar,status. - Field Completion: Check if all fields that require translation are filled out.
- Unique Keys: Verify that no two dictionary entries (keys) are identical.
- Conditional Completion: If the
statusfield is set to"Needs translation", the corresponding translation fields are allowed to be empty. - Multiple Choice Fields: Validate that values for multiple-choice fields (e.g.,
part_of_speech,status) are selected from an approved, predefined list of values.part_of_speechcan only have these values:"noun","verb","adjective","adverb","interjection".statuscan only have these values:"Needs translation","Pending review","Obsolete","Approved","Do not translate".
2. Content and Linguistic Quality Checks
- Punctuation/Typographical Use: Verify the correct usage of diacritical marks/apostrophes common in the language, specifically checking for the proper use of: tutuq belgisi (ʼ) or okina (ʻ), and flagging the use of incorrect symbols like the straight apostrophe (
'), fancy quotes (’,‘), or grave accent (```) in places where the correct mark is required. - Case Rule: All text in the
enanduzfields must be in lowercase, unless the word is an abbreviation or a proper name. - Leading/Trailing Whitespace: Check that there are no unnecessary blank spaces at the beginning or end of any translation string.
- Empty String Check: Validate that no required field contains an empty string (
"").
3. Optional Formatting and Utility
- Entry Sorting (Optional): Check if the dictionary entries (keys) are sorted alphabetically. This is optional but highly recommended for maintainability.
- Sorting Script: Create an accompanying script (e.g.,
scripts/sort_dictionary.py) that can be run locally or within the CI to automatically sort the dictionary entries based on their primary key, allowing maintainers to easily fix sorting issues.
Acceptance Criteria
- A new CI job is added to the pipeline
- This job is triggered on pushes to the main branch and pull
- The script executes all specified
- If any check fails (e.g., a field is missing, or an incorrect apostrophe is used), the CI pipeline fails, preventing the merged code from breaking the data integrity.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
No status