Fix: handle malformed UTF-8 in CatalogWidget cache key (#40824)#40825
Fix: handle malformed UTF-8 in CatalogWidget cache key (#40824)#40825lbajsarowicz wants to merge 2 commits into
Conversation
ProductsList::getCacheKeyInfo() passes raw $_GET to json_encode via the JSON serializer. Truncated tracking parameters (fbclid, gclid, utm_*) and bot scanner traffic frequently carry malformed UTF-8 byte sequences, which raise JSON_ERROR_UTF8 and replace the widget output with the generic CMS fallback while spamming system.log with CRITICAL "Unable to serialize value" entries. Scrub invalid byte sequences via mb_convert_encoding before serialization. Valid UTF-8 input is unchanged, so existing cache keys are preserved; only previously-broken requests now produce a stable cache key instead of an exception. Fixes magento#40824
|
Hi @lbajsarowicz. Thank you for your contribution!
Allowed build names are:
You can find more information about the builds here For more details, review the Code Contributions documentation. |
|
The security team has been informed about this pull request due to the presence of risky security keywords. For security vulnerability reports, please visit Adobe's vulnerability disclosure program on HackerOne or email psirt@adobe.com. |
|
@magento run all tests |
…st index Replace mb_convert_encoding(\$value, 'UTF-8', 'UTF-8') with mb_scrub(\$value, 'UTF-8') in scrubInvalidUtf8(). mb_scrub (PHP 8.0+) has deterministic behavior for invalid byte sequences (replaces with U+FFFD) and avoids cross-version inconsistencies in PHP 8.1/8.2. Replace hardcoded $info[10] in the unit test with a positional lookup relative to the end of the array so the assertion stays correct if getCacheKeyInfo() gains new entries.
|
@magento run all tests |
Description (*)
Magento\CatalogWidget\Block\Product\ProductsList::getCacheKeyInfo()passes raw$_GET($this->getRequest()->getParams()) toMagento\Framework\Serialize\Serializer\Json::serialize(). The serializer callsjson_encode()with noJSON_INVALID_UTF8_*flag, so any malformed UTF-8 byte sequence in the query string raisesJSON_ERROR_UTF8and the serializer rethrowsInvalidArgumentException: Unable to serialize value. Error: Malformed UTF-8 characters, possibly incorrectly encoded.Because
getCacheKeyInfo()is called fromAbstractBlock::getCacheKey()→_loadCache()→toHtml(), the entire widget fails to render and Magento falls back to the generic"Sorry, we cannot generate the content..."template, while logging aCRITICALentry tosystem.logon every hit.Malformed UTF-8 in query strings is common in production traffic: truncated
fbclid/gclid/dclid/utm_*tracking parameters from Facebook & Google Ads, broken share buttons, and bot scanners. No authentication or special preconditions are needed — a single crafted GET parameter is enough to break any CMS page hosting the widget.This change scrubs invalid UTF-8 byte sequences in the request params via
mb_convert_encoding($value, 'UTF-8', 'UTF-8')before serialization. Valid UTF-8 input is unchanged (identity transform), so existing cache keys are preserved; only previously-broken requests now produce a stable cache key instead of an exception. Scope is intentionally narrow — the fix is local to the widget cache key, where substitution is safe, rather than changing the globalJson::serialize()contract (which is also used for stored data that must round-trip exactly).Related Pull Requests
Fixed Issues (if relevant)
Related (same root cause, different code paths):
MC-23846Magento_ReCaptchajson_encodefailure ingetSerializedCheckoutConfigManual testing scenarios (*)
https://example.test/?q=%C0%AF(overlong UTF-8 slash)https://example.test/?dclid=%EDclidhttps://example.test/?utm_source=%E0https://example.test/?fbclid=foo%var/log/system.logrecordsCRITICAL: InvalidArgumentException: Unable to serialize value. Error: Malformed UTF-8 characters, possibly incorrectly encoded.system.log; cache key is stable across repeated requests with the same malformed parameter (block cache works).Questions or comments
Two alternative directions were considered:
Json::serialize()to passJSON_INVALID_UTF8_SUBSTITUTE— rejected. The serializer is also used for stored data (config, quote items, persistent state) where silently substituting bytes could mask real upstream bugs and corrupt round-trips. The defect is at the call site, not the serializer.Contribution checklist (*)