-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[pt] Replaced obsolete rule with new one ID:SIMPLIFICAR_DE_VOCÊS_VOSSO_SEU #11675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis PR replaces a single Portuguese style rule ( Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (3)
4076-4079: Misses “Você/Vocês” at sentence startTokens are case-sensitive by default; add case_sensitive="no" to catch “Você/Vocês”.
- <token>você</token> + <token case_sensitive="no">você</token> @@ - <token>vocês</token> + <token case_sensitive="no">vocês</token> @@ - <token>você</token> + <token case_sensitive="no">você</token> @@ - <token>vocês</token> + <token case_sensitive="no">vocês</token>Also applies to: 4094-4097, 4112-4114, 4130-4133
4115-4116: Rule #3 over‑restricts to sentence/punct end, missing “O carro de você é …”Requiring SENT_END/_PUNCT/SPS00 immediately after “de você” drops common continuations with verbs. Make it optional (or remove).
- <token postag='SENT_END|_PUNCT|SPS00' postag_regexp='yes'/> + <token min="0" max="1" postag='_PUNCT|SENT_END|SPS00' postag_regexp='yes'/>
4067-4070: Confirm staging flag plandefault='temp_off' matches your staged‑enable workflow. Confirm when you plan to flip to on and whether the old group will be disabled in the same PR or a follow‑up.
Based on learnings, this aligns with your “temp_off” methodology.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml(1 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11433
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3716-3716
Timestamp: 2025-07-09T06:30:58.965Z
Learning: marcoagpinto uses temporary placeholder values like "temp_off" in LanguageTool rule attributes while waiting for nightly test results before enabling rules, as part of his testing methodology to ensure rules don't require minor adjustments.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.
📚 Learning: 2025-11-18T06:43:24.213Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11648
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199
Timestamp: 2025-11-18T06:43:24.213Z
Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").
Applied to files:
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-01-17T08:46:06.456Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11196
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892
Timestamp: 2025-01-17T08:46:06.456Z
Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.
Applied to files:
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-04-26T11:44:57.044Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11345
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804
Timestamp: 2025-04-26T11:44:57.044Z
Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.
Applied to files:
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-09-19T05:58:04.682Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11515
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717
Timestamp: 2025-09-19T05:58:04.682Z
Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.
Applied to files:
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-10-08T06:41:55.119Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11557
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078
Timestamp: 2025-10-08T06:41:55.119Z
Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.
Applied to files:
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
📚 Learning: 2025-08-25T03:54:09.419Z
Learnt from: marcoagpinto
Repo: languagetool-org/languagetool PR: 11490
File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:40665-40665
Timestamp: 2025-08-25T03:54:09.419Z
Learning: In Portuguese LanguageTool rules, the word "porta" appears as both a verb and a noun, creating inherent POS tagging ambiguity. Rules targeting "porta" in compound constructions should not exclude verb tags as exceptions because this would break the rule's functionality when "porta" functions as a noun in valid compound word patterns.
Applied to files:
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (java-kotlin)
🔇 Additional comments (1)
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml (1)
4067-4147: Based on my verification, the review comment's concern about overlap is valid and confirmed. Let me provide the rewritten review comment:
Overlap with existing group requires action before enabling
This new group overlaps with SIMPLIFICAR_DE_VOCÊS_VOSSA_VOSSAS_VOSSO_VOSSOS (lines 4195–4228). Both rules target the same semantic patterns:
- OLD rule matches:
NC[^C].+(singular noun) +ser+de você(s)- NEW rules match:
DA0[FM].+(determiner) + noun/adjective +ser+de você(s)Sentences like "A culpa é de você" would fire both rules, producing duplicate suggestions. Before enabling this group, either:
- Set the older group to
default='off', or- Keep this group as
default='temp_off'until the older one is retired⛔ Skipped due to learnings
Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11648 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4198-4199 Timestamp: 2025-11-18T06:43:24.213Z Learning: In Portuguese disambiguation rules, when removing verb readings from words that have both verb and noun tags (e.g., "testes" with VMSP2S0 and NCMP000, "pacientes" with VMSP2S0 and NCCP000), combining similar verb form tags like VMP00PM and VMSP2S0 in a single pattern is appropriate because they create similar noun/verb ambiguities in the same grammatical contexts (after verbs/adjectives/pronouns). Adding AQ exceptions to such patterns would break functionality by filtering out valid cases where past participles should be treated as nouns (e.g., "enviados" in "devem ser enviados").Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11557 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:4068-4078 Timestamp: 2025-10-08T06:41:55.119Z Learning: In Portuguese disambiguation rules, when a pattern targets tokens with multiple verb readings (e.g., VMIP3S0 and VMM02S0), including an exception like `<exception postag_regexp='yes' postag='AQ.+'//>` on subsequent participle tokens is necessary to prevent false positives, even though it reduces the number of matches. The rule can still fire successfully for cases where the participle doesn't have an adjective reading, as confirmed by testing showing ~4683 matches across ~950k sentences.Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11196 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/resource/pt/disambiguation.xml:3884-3892 Timestamp: 2025-01-17T08:46:06.456Z Learning: In Portuguese disambiguation rules, when handling multiple verb forms for the same pattern, use separate rules for each verb form tag instead of combining them with multiple <wd> tags in a single rule.Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11345 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3801-3804 Timestamp: 2025-04-26T11:44:57.044Z Learning: In Portuguese rules for LanguageTool, using `<match no='X' postag='V.+' postag_regexp='yes'>verb</match>` pattern with infinitive verbs (like "zangar") is preferred over direct adjective forms because it allows proper handling of gender and number inflections, while common gender adjectives like "descontente" can use the `postag_replace='AQ0C$10'` pattern.Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11515 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/style.xml:3714-3717 Timestamp: 2025-09-19T05:58:04.682Z Learning: In Portuguese LanguageTool rules, using `<match no='X' postag='AQ.+' postag_regexp='yes'>adjective</match>` automatically preserves gender and number inflection from the matched token without requiring postag_replace, allowing adjectives like "qualificado" to properly inflect to "qualificado/qualificada/qualificados/qualificadas" based on the original matched form.Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11490 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/grammar.xml:40665-40665 Timestamp: 2025-08-25T03:54:09.419Z Learning: In Portuguese LanguageTool rules, the word "porta" appears as both a verb and a noun, creating inherent POS tagging ambiguity. Rules targeting "porta" in compound constructions should not exclude verb tags as exceptions because this would break the rule's functionality when "porta" functions as a noun in valid compound word patterns.Learnt from: marcoagpinto Repo: languagetool-org/languagetool PR: 11662 File: languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/grammar.xml:2891-2891 Timestamp: 2025-11-22T10:05:18.956Z Learning: For the verb "precisar" followed by an infinitive in Portuguese: both "precisar de + infinitivo" and "precisar + infinitivo" are grammatically correct according to Napoleão Mendes de Almeida (Dicionário das Questões Vernáculas). However, there is a regional preference: European Portuguese (pt-PT) preferentially uses "precisar de + infinitivo" as the current norm (more frequent/expected), while Brazilian Portuguese (pt-BR) tends to omit "de" before infinitives. For pt-PT specific rules, the preference should be to enforce or suggest the pt-PT norm (with "de") rather than the pt-BR usage pattern (without "de").
| <pattern> | ||
| <token postag='DA0[FM].+' postag_regexp='yes'/> | ||
| <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token> | ||
| <token inflected='yes' regexp='no'>ser<exception scope='previous' postag_regexp='yes' postag='_PUNCT|V.+|SPS.+|Z0.+'/></token> | ||
| <marker> | ||
| <token>de</token> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rules won’t match: skip='1' between determiner and noun blocks real cases
Between DA0… and NC/AQ the noun/adjective is adjacent (“A culpa…”, “Os carros…”). Using skip='1' requires an extra token and prevents firing. Drop skip or make it optional.
Apply:
- <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+ <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
@@
- <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+ <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
@@
- <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+ <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
@@
- <token skip='1' postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>
+ <token postag='NC.+|AQ.+' postag_regexp='yes'><exception postag_regexp='yes' postag='V.+|SPS.+|Z0.+'/></token>Also applies to: 4089-4096, 4109-4116, 4127-4134
🤖 Prompt for AI Agents
In
languagetool-language-modules/pt/src/main/resources/org/languagetool/rules/pt/pt-PT/style.xml
around lines 4071-4076 the use of skip='1' between the determiner (DA0...) token
and the noun/adjective (NC.+|AQ.+) token prevents the rule from matching normal
adjacent sequences (e.g. "A culpa", "Os carros"); remove skip='1' or change it
to optional (e.g. remove the attribute or use skip='0,1' if supported) so the
pattern matches adjacent determiner+noun/adjective; apply the same change to the
other occurrences at 4089-4096, 4109-4116, and 4127-4134.
Replaced an old obsolete rule with a newer more powerful one, that simplifies text.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.