Skip to content

Conversation

@marcoagpinto
Copy link
Member

@marcoagpinto marcoagpinto commented Nov 27, 2025

Small improvements and clean-up.

Summary by CodeRabbit

  • Bug Fixes
    • Improved Portuguese language style rule detection to better recognize word variations and provide more accurate formal language suggestions for style checking.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

Walkthrough

A Portuguese language style rule is updated to enhance detection of formal language alternatives. The rule for "[Científico] 'usar/utilizar' termo → empregar" transitions from a simple pattern to one utilizing regex matching and inflection markers. Rule attributes are explicitly declared, including ID, name, type, tags, tone tags, and goal-specific markers. Examples are expanded for contextual guidance.

Changes

Cohort / File(s) Summary
Portuguese style rule enhancement
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
Updated "[Científico] 'usar/utilizar' termo → empregar" rule with explicit attributes (id, name, type, tags, tone_tags, is_goal_specific), regex pattern token `usar

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify regex pattern usar|utilizar correctness and inflection handling (inflected='yes')
  • Validate all example cases properly trigger and resolve correctly
  • Confirm rule attributes and tone tags align with language tool conventions
  • Test that formal context suggestions work as intended

Suggested reviewers

  • p-goulart
  • susanaboatto

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title '[pt] Small improvement/clean-up in rule ID:CIENTÍFICO_EMPREGAR_TERMO' clearly identifies the specific rule being modified and indicates the nature of the change (improvement/clean-up), aligning well with the actual modifications to enhance pattern matching for Portuguese language rules.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lt_marcoagpinto_20251127_1944

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8afe8c2 and 78cbe6d.

📒 Files selected for processing (1)
  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11433
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3716-3716
Timestamp: 2025-07-09T06:30:58.965Z
Learning: marcoagpinto uses temporary placeholder values like "temp_off" in LanguageTool rule attributes while waiting for nightly test results before enabling rules, as part of his testing methodology to ensure rules don't require minor adjustments.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11490
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:40665-40665
Timestamp: 2025-08-25T03:54:09.419Z
Learning: In Portuguese LanguageTool rules, the word "porta" appears as both a verb and a noun, creating inherent POS tagging ambiguity. Rules targeting "porta" in compound constructions should not exclude verb tags as exceptions because this would break the rule's functionality when "porta" functions as a noun in valid compound word patterns.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11460
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:14496-14497
Timestamp: 2025-07-31T07:46:39.805Z
Learning: In Portuguese LanguageTool antipatterns, when words are intended to be matched exactly as written (static forms), the `inflected='yes'` attribute is deliberately omitted, and the regex pattern matches literal word forms rather than inflected variants.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11415
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3721-3723
Timestamp: 2025-06-28T05:00:46.342Z
Learning: In Portuguese LanguageTool rules, capture group references $1 through $9 work correctly in postag_replace patterns. The parsing issue only occurs when a single-digit group reference is immediately followed by digits (like $1000), creating ambiguity between "group 1 + literal 000" vs "group 1000". Using braces ${1}000 disambiguates this case.
Learnt from: inesakochur
Repo: languagetool-org/languagetool PR: 11509
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/barbarisms.txt:251-252
Timestamp: 2025-09-12T14:34:34.767Z
Learning: For Portuguese barbarisms file: "maçom" is correct Portuguese and should not be flagged as a barbarism. The French "maçon" maps to multiple Portuguese alternatives: "maçom|mação|maçónico".
📚 Learning: 2025-04-26T11:44:57.044Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-11-18T06:43:24.213Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-10-08T06:41:55.119Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-01-17T08:46:06.456Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-09-19T05:58:04.682Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-07-31T07:46:39.805Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11460
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:14496-14497
Timestamp: 2025-07-31T07:46:39.805Z
Learning: In Portuguese LanguageTool antipatterns, when words are intended to be matched exactly as written (static forms), the `inflected='yes'` attribute is deliberately omitted, and the regex pattern matches literal word forms rather than inflected variants.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-08-25T03:54:09.419Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11490
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:40665-40665
Timestamp: 2025-08-25T03:54:09.419Z
Learning: In Portuguese LanguageTool rules, the word "porta" appears as both a verb and a noun, creating inherent POS tagging ambiguity. Rules targeting "porta" in compound constructions should not exclude verb tags as exceptions because this would break the rule's functionality when "porta" functions as a noun in valid compound word patterns.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-07-09T06:30:58.965Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11433
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3716-3716
Timestamp: 2025-07-09T06:30:58.965Z
Learning: marcoagpinto uses temporary placeholder values like "temp_off" in LanguageTool rule attributes while waiting for nightly test results before enabling rules, as part of his testing methodology to ensure rules don't require minor adjustments.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
📚 Learning: 2025-06-28T05:00:46.342Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11415
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3721-3723
Timestamp: 2025-06-28T05:00:46.342Z
Learning: In Portuguese LanguageTool rules, capture group references $1 through $9 work correctly in postag_replace patterns. The parsing issue only occurs when a single-digit group reference is immediately followed by digits (like $1000), creating ambiguity between "group 1 + literal 000" vs "group 1000". Using braces ${1}000 disambiguates this case.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml (1)

10141-10156: Based on my verification of the codebase, I can now provide the rewritten review comment:


Pattern XML syntax is valid and follows established LanguageTool conventions.

The <marker> element wrapping a <token> inside a <pattern> section is a standard structure used throughout this codebase (see lines 211–213, 318–320, 551–553, etc.). The <marker> designates which tokens are part of the matched issue, and the skip='1' attribute correctly skips the intervening token (e.g., "o") between the verb and noun.

The <match no='1' postag='V.+' postag_regexp='yes'>empregar</match> correctly references the first (marked) token and will properly inflect "empregar" based on the original verb's postag, allowing suggestions like "emprega-se", "empregamos", etc. The pattern and examples are consistent.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@marcoagpinto marcoagpinto merged commit e0e8ee3 into master Nov 27, 2025
6 checks passed
@marcoagpinto marcoagpinto deleted the lt_marcoagpinto_20251127_1944 branch November 27, 2025 20:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants