Skip to content

Document intended emoji shortcode parsing behavior for clients #1850

@daprice

Description

@daprice

There is a lot of nuance in how custom emoji shortcodes are parsed that I don’t think is documented anywhere. As a result, I don’t think most clients have consistent behavior here, and as a client dev I have no idea what the intended behavior is, if any. The API docs should specify how these are intended to be parsed.

For example, say we have a custom emoji with the shortcode :blobcat:

  • Should shortcodes inside various HTML tags be replaced? My gut would say shortcodes inside formatting tags like <em>:blobcat:</em> should be replaced, but what about <a href="example.com">this is a link :blobcat:</a>? Or <code>:blobcat:</code>? Standard HTML behavior would suggest all instances of the shortcode should be replaced unless they’re escaped somehow, but Mastodon doesn’t have a documented way of escaping these shortcodes, does it?
  • If a shortcode spans across a formatting change, say the first half is italicized and the second half is not, should it be replaced with the custom emoji? (E.g. the HTML might look something like <em>:blob</em>cat:) The docs say the shortcodes are “plain text shortcodes”, but that doesn’t indicate whether clients should match against the plain text form before or after parsing HTML tags.
  • As noted in Inconsistent custom emoji shortcode behaviour when lacking whitespace mastodon#7364, there are some arcane rules about replacing multiple consecutive shortcodes to prevent things like IPv6 addresses from being unintentionally replaced with custom emoji. Client developers need to know what the intended behavior is here!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions