Skip to content

Fix: handle malformed UTF-8 in CatalogWidget cache key (#40824)#40825

Open
lbajsarowicz wants to merge 2 commits into
magento:2.4-developfrom
lbajsarowicz:fix/40824-malformed-utf8-cache-key
Open

Fix: handle malformed UTF-8 in CatalogWidget cache key (#40824)#40825
lbajsarowicz wants to merge 2 commits into
magento:2.4-developfrom
lbajsarowicz:fix/40824-malformed-utf8-cache-key

Conversation

@lbajsarowicz
Copy link
Copy Markdown
Contributor

Description (*)

Magento\CatalogWidget\Block\Product\ProductsList::getCacheKeyInfo() passes raw $_GET ($this->getRequest()->getParams()) to Magento\Framework\Serialize\Serializer\Json::serialize(). The serializer calls json_encode() with no JSON_INVALID_UTF8_* flag, so any malformed UTF-8 byte sequence in the query string raises JSON_ERROR_UTF8 and the serializer rethrows InvalidArgumentException: Unable to serialize value. Error: Malformed UTF-8 characters, possibly incorrectly encoded.

Because getCacheKeyInfo() is called from AbstractBlock::getCacheKey()_loadCache()toHtml(), the entire widget fails to render and Magento falls back to the generic "Sorry, we cannot generate the content..." template, while logging a CRITICAL entry to system.log on every hit.

Malformed UTF-8 in query strings is common in production traffic: truncated fbclid / gclid / dclid / utm_* tracking parameters from Facebook & Google Ads, broken share buttons, and bot scanners. No authentication or special preconditions are needed — a single crafted GET parameter is enough to break any CMS page hosting the widget.

This change scrubs invalid UTF-8 byte sequences in the request params via mb_convert_encoding($value, 'UTF-8', 'UTF-8') before serialization. Valid UTF-8 input is unchanged (identity transform), so existing cache keys are preserved; only previously-broken requests now produce a stable cache key instead of an exception. Scope is intentionally narrow — the fix is local to the widget cache key, where substitution is safe, rather than changing the global Json::serialize() contract (which is also used for stored data that must round-trip exactly).

Related Pull Requests

  • N/A

Fixed Issues (if relevant)

  1. Fixes Regression: Non UTF-8 encoded URI component still throws "Unable to serialize value" in CatalogWidget on 2.4.7-p5 #40824 — Regression: Non UTF-8 encoded URI component still throws "Unable to serialize value" in CatalogWidget on 2.4.7-p5

Related (same root cause, different code paths):

Manual testing scenarios (*)

  1. Add a "Catalog Products List" widget to any CMS page (e.g. homepage).
  2. Open the page with a malformed UTF-8 byte sequence in a query parameter:
    • https://example.test/?q=%C0%AF (overlong UTF-8 slash)
    • https://example.test/?dclid=%EDclid
    • https://example.test/?utm_source=%E0
    • https://example.test/?fbclid=foo%
  3. Before the fix: widget is replaced by the generic CMS fallback message; var/log/system.log records CRITICAL: InvalidArgumentException: Unable to serialize value. Error: Malformed UTF-8 characters, possibly incorrectly encoded.
  4. After the fix: widget renders normally; no entry is added to system.log; cache key is stable across repeated requests with the same malformed parameter (block cache works).

Questions or comments

Two alternative directions were considered:

  • Changing Json::serialize() to pass JSON_INVALID_UTF8_SUBSTITUTE — rejected. The serializer is also used for stored data (config, quote items, persistent state) where silently substituting bytes could mask real upstream bugs and corrupt round-trips. The defect is at the call site, not the serializer.
  • Hashing the params instead of serializing them — rejected for this PR. It would also help with cache fragmentation (every unique URL parameter creates a new cache entry), but it changes the cache key shape for every existing valid request and is out of scope for a regression fix.

Contribution checklist (*)

  • Pull request has a meaningful description of its purpose
  • All commits are accompanied by meaningful commit messages
  • All new or changed code is covered with unit/integration tests (where applicable)
  • All automated tests passed successfully (all builds are green)

ProductsList::getCacheKeyInfo() passes raw $_GET to json_encode via
the JSON serializer. Truncated tracking parameters (fbclid, gclid,
utm_*) and bot scanner traffic frequently carry malformed UTF-8
byte sequences, which raise JSON_ERROR_UTF8 and replace the widget
output with the generic CMS fallback while spamming system.log with
CRITICAL "Unable to serialize value" entries.

Scrub invalid byte sequences via mb_convert_encoding before
serialization. Valid UTF-8 input is unchanged, so existing cache
keys are preserved; only previously-broken requests now produce a
stable cache key instead of an exception.

Fixes magento#40824
@m2-assistant
Copy link
Copy Markdown

m2-assistant Bot commented May 24, 2026

Hi @lbajsarowicz. Thank you for your contribution!
Here are some useful tips on how you can test your changes using Magento test environment.
❗ Automated tests can be triggered manually with an appropriate comment:

  • @magento run all tests - run or re-run all required tests against the PR changes
  • @magento run <test-build(s)> - run or re-run specific test build(s)
    For example: @magento run Unit Tests

<test-build(s)> is a comma-separated list of build names.

Allowed build names are:
  1. Database Compare
  2. Functional Tests CE
  3. Functional Tests EE
  4. Functional Tests B2B
  5. Integration Tests
  6. Magento Health Index
  7. Sample Data Tests CE
  8. Sample Data Tests EE
  9. Sample Data Tests B2B
  10. Static Tests
  11. Unit Tests
  12. WebAPI Tests
  13. Semantic Version Checker

You can find more information about the builds here
ℹ️ Run only required test builds during development. Run all test builds before sending your pull request for review.


For more details, review the Code Contributions documentation.
Join Magento Community Engineering Slack and ask your questions in #github channel.

@ct-prd-pr-scan
Copy link
Copy Markdown

The security team has been informed about this pull request due to the presence of risky security keywords. For security vulnerability reports, please visit Adobe's vulnerability disclosure program on HackerOne or email psirt@adobe.com.

@lbajsarowicz
Copy link
Copy Markdown
Contributor Author

@magento run all tests

@ct-prd-projects-boards-automation ct-prd-projects-boards-automation Bot added the Priority: P2 A defect with this priority could have functionality issues which are not to expectations. label May 25, 2026
@github-project-automation github-project-automation Bot moved this to Pending Review in Pull Requests Dashboard May 25, 2026
…st index

Replace mb_convert_encoding(\$value, 'UTF-8', 'UTF-8') with mb_scrub(\$value, 'UTF-8')
in scrubInvalidUtf8(). mb_scrub (PHP 8.0+) has deterministic behavior for invalid byte
sequences (replaces with U+FFFD) and avoids cross-version inconsistencies in PHP 8.1/8.2.

Replace hardcoded $info[10] in the unit test with a positional lookup relative to the
end of the array so the assertion stays correct if getCacheKeyInfo() gains new entries.
@lbajsarowicz
Copy link
Copy Markdown
Contributor Author

@magento run all tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Priority: P2 A defect with this priority could have functionality issues which are not to expectations. Progress: pending review Project: Community Picked PRs upvoted by the community

Projects

Status: Pending Review

Development

Successfully merging this pull request may close these issues.

Regression: Non UTF-8 encoded URI component still throws "Unable to serialize value" in CatalogWidget on 2.4.7-p5

2 participants