Skip to content

Default behavior of json_encode is inappropriate for HTML/XML documents #1141

@totten

Description

@totten

Background

There is a good write-up about using PHP to generate JSON that is embedded within an HTML document:

https://sirre.al/2025/08/06/safe-json-in-script-tags-how-not-to-break-a-site/

TLDR: json_encode() has lots of crazy-looking flags. These flags would be silly if you were focused on pure JSON. But PHP applications (and Smarty applications) are often used for HTML/XML documents with embedded bits of JSON. This context requires a slightly different flavor of JSON. ("Flavor of JSON" => it's valid/well-formed JSON with quirky choices.)

(This issue is inspired by @colemanw's filing on https://lab.civicrm.org/dev/core/-/issues/6080.)

Discussion

I think the important thing is this:

  • It is typical to use Smarty to generate HTML/XML.
  • If you generate an HTML document with user-supplied content processed by Smarty's json_encode modifier, then (by default) you will probably a problem.
  • I think the blog makes a persuasive case that the JSON flags are not a matter of local taste. It's a matter of output's document-format:
    • If you are outputting an HTML document (as in a typical web-app), then you want one set of JSON flags (e.g. JSON_HEX_TAG | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS).
    • If you are outputting a pure JSON document (as when composer.phar generates composer.json), then you want another set of JSON flags (e.g. JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT).
  • There's an asymmetry:
    • If you incorrectly have a pure JSON document generated with the flags for HTML-JSON, then the output is quirky but valid.
    • If you incorrectly have an HTML document generated with the flags for pure-JSON, then the output is unsafe.

This is similar to #1011, in that both issues seek a default behavior for json_encode(). It differs in that #1011 talks about default character-set (Smarty::$_CHARSET), and this talks about neutralizing risky strings for HTML ("<!--" vs "\u003C!--").

This can also be related to #1048 -- e.g. if one can override json_encode, then it's possible to change the default JSON rules, and solve the issue for one app. However, outputting HTML is a normal thing for Smarty -- you shouldn't have to do anything complicated to produce well-formed markup.

I don't know the best way to indicate the appropriate codec, but you could imagine things like....

Smarty::$_JSON_FLAGS = JSON_HEX_TAG | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS
// and then `json_encode` consults this property

// or...

$smarty->setJsonDefaults(JSON_HEX_TAG | JSON_UNESCAPED_SLASHES | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_LINE_TERMINATORS);
// and then `json_encode` consults this setting

// or...

$smarty->setOutputFormat('text/html');
// and then `json_encode` consults this setting

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions