Skip to content

Conversation

@marcoagpinto
Copy link
Member

@marcoagpinto marcoagpinto commented Nov 27, 2025

Replaced an old obsolete rule with a newer more powerful one, that simplifies text.

Summary by CodeRabbit

  • Style
    • Enhanced Portuguese (Portugal) style checking with refined rules for possessive phrase transformations, providing more targeted suggestions in various contexts.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 27, 2025

Walkthrough

This PR replaces a single Portuguese style rule (SIMPLIFICAR_A_XXX_DE_VOCÊS) with a new rulegroup (SIMPLIFICAR_DE_VOCÊS_VOSSO_SEU) containing four distinct rules for simplifying possessive phrases using "de você" and "de vocês" constructs into more concise vosso/seu equivalents in pt-PT.

Changes

Cohort / File(s) Summary
Portuguese pt-PT style rule refactoring
pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
Removed antipattern-based rule SIMPLIFICAR_A_XXX_DE_VOCÊS and replaced with new rulegroup SIMPLIFICAR_DE_VOCÊS_VOSSO_SEU containing four targeted rules for transforming "é/são de você(s)" and "XXX de você(s)" constructs into possessive forms with explicit patterns, messages, examples, and suggestions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

  • Verify Portuguese language correctness and pattern accuracy for all four new rules
  • Confirm that the new rulegroup properly replaces the old rule without breaking existing functionality
  • Validate POS constraints and marker-based suggestion logic for each transformation variant

Possibly related PRs

Suggested reviewers

  • p-goulart
  • susanaboatto

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: replacing an obsolete Portuguese style rule with a new, more powerful one, including the relevant rule ID.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch lt_marcoagpinto_20251127_1816

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (3)

4076-4079: Misses “Você/Vocês” at sentence start

Tokens are case-sensitive by default; add case_sensitive="no" to catch “Você/Vocês”.

-                    <token>você</token>
+                    <token case_sensitive="no">você</token>
@@
-                    <token>vocês</token>
+                    <token case_sensitive="no">vocês</token>
@@
-                    <token>você</token>
+                    <token case_sensitive="no">você</token>
@@
-                    <token>vocês</token>
+                    <token case_sensitive="no">vocês</token>

Also applies to: 4094-4097, 4112-4114, 4130-4133


4115-4116: Rule #3 over‑restricts to sentence/punct end, missing “O carro de você é …”

Requiring SENT_END/_PUNCT/SPS00 immediately after “de você” drops common continuations with verbs. Make it optional (or remove).

-                <token postag='SENT_END|_PUNCT|SPS00' postag_regexp='yes'/>
+                <token min="0" max="1" postag='_PUNCT|SENT_END|SPS00' postag_regexp='yes'/>

4067-4070: Confirm staging flag plan

default='temp_off' matches your staged‑enable workflow. Confirm when you plan to flip to on and whether the old group will be disabled in the same PR or a follow‑up.

Based on learnings, this aligns with your “temp_off” methodology.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d6baa0 and fa493d6.

📒 Files selected for processing (1)
  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11433
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3716-3716
Timestamp: 2025-07-09T06:30:58.965Z
Learning: marcoagpinto uses temporary placeholder values like "temp_off" in LanguageTool rule attributes while waiting for nightly test results before enabling rules, as part of his testing methodology to ensure rules don't require minor adjustments.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.
📚 Learning: 2025-11-18T06:43:24.213Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-01-17T08:46:06.456Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-04-26T11:44:57.044Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-09-19T05:58:04.682Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-10-08T06:41:55.119Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-08-25T03:54:09.419Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11490
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:40665-40665
Timestamp: 2025-08-25T03:54:09.419Z
Learning: In Portuguese LanguageTool rules, the word "porta" appears as both a verb and a noun, creating inherent POS tagging ambiguity. Rules targeting "porta" in compound constructions should not exclude verb tags as exceptions because this would break the rule's functionality when "porta" functions as a noun in valid compound word patterns.

Applied to files:

  • languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (1)

4067-4147: Based on my verification, the review comment's concern about overlap is valid and confirmed. Let me provide the rewritten review comment:


Overlap with existing group requires action before enabling

This new group overlaps with SIMPLIFICAR_DE_VOCÊS_VOSSA_VOSSAS_VOSSO_VOSSOS (lines 4195–4228). Both rules target the same semantic patterns:

  • OLD rule matches: NC[^C].+ (singular noun) + ser + de você(s)
  • NEW rules match: DA0[FM].+ (determiner) + noun/adjective + ser + de você(s)

Sentences like "A culpa é de você" would fire both rules, producing duplicate suggestions. Before enabling this group, either:

  • Set the older group to default='off', or
  • Keep this group as default='temp_off' until the older one is retired
⛔ Skipped due to learnings
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11490
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:40665-40665
Timestamp: 2025-08-25T03:54:09.419Z
Learning: In Portuguese LanguageTool rules, the word "porta" appears as both a verb and a noun, creating inherent POS tagging ambiguity. Rules targeting "porta" in compound constructions should not exclude verb tags as exceptions because this would break the rule's functionality when "porta" functions as a noun in valid compound word patterns.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11662
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/grammar.xml:2891-2891
Timestamp: 2025-11-22T10:05:18.956Z
Learning: For the verb "precisar" followed by an infinitive in Portuguese: both "precisar de + infinitivo" and "precisar + infinitivo" are grammatically correct according to Napoleão Mendes de Almeida (Dicionário das Questões Vernáculas). However, there is a regional preference: European Portuguese (pt-PT) preferentially uses "precisar de + infinitivo" as the current norm (more frequent/expected), while Brazilian Portuguese (pt-BR) tends to omit "de" before infinitives. For pt-PT specific rules, the preference should be to enforce or suggest the pt-PT norm (with "de") rather than the pt-BR usage pattern (without "de").

Comment on lines +4071 to +4076
<pattern>
<token postag='DA0[FM].+' postag_regexp='yes'/>
<token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
<token inflected='yes' regexp='no'>ser<exception scope='previous' postag_regexp='yes' postag='_PUNCT|V.+|SPS.+|Z0.+'/></token>
<marker>
<token>de</token>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Rules won’t match: skip='1' between determiner and noun blocks real cases

Between DA0… and NC/AQ the noun/adjective is adjacent (“A culpa…”, “Os carros…”). Using skip='1' requires an extra token and prevents firing. Drop skip or make it optional.

Apply:

-                <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+                <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
@@
-                <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+                <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
@@
-                    <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+                    <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
@@
-                    <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+                    <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>

Also applies to: 4089-4096, 4109-4116, 4127-4134

🤖 Prompt for AI Agents
In
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
around lines 4071-4076 the use of skip='1' between the determiner (DA0...) token
and the noun/adjective (NC.+|AQ.+) token prevents the rule from matching normal
adjacent sequences (e.g. "A culpa", "Os carros"); remove skip='1' or change it
to optional (e.g. remove the attribute or use skip='0,1' if supported) so the
pattern matches adjacent determiner+noun/adjective; apply the same change to the
other occurrences at 4089-4096, 4109-4116, and 4127-4134.

@marcoagpinto marcoagpinto merged commit 8afe8c2 into master Nov 27, 2025
6 checks passed
@marcoagpinto marcoagpinto deleted the lt_marcoagpinto_20251127_1816 branch November 27, 2025 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants