[RFC] Plugins, normalizers, and nodes

# [RFC] Plugins, normalizers, and nodes

_Thanks for the feedback in #6025! Some very opinionated thoughts... (cc @acywatson @fantactuka @ivailop7 @etrepum @GermanJablo @abelsj60 @StyleT)_

Lexical Core doesn't do much when it comes to product specific features. Facebook Comments has over 20 named features on the editor, most notably Mentions, Hashtags and Auto-reply prefixes. These features are now informally called "plugins", but there's no formal definition on what they are and the encapsulation we provide just evolved organically.

### Multiple ways to hit the same goal

Since Lexical inception we have avoided having multiple routes to solving the same goal. There is often a better way and this can easily lead to inefficiences. A couple examples:

- `registerUpdateListener`(an API we wanted to kill) is too generic purpose and not optimized for any use case in particular. Others such as `registerTextContentListener` are optimized for a use case in particular at an update/reconciler level.
- `registerXYZListener` followed by `update`. In most cases, a `registerTransform` is preferable. This is an antipattern that we will now flag via ESLint #5908.

The proposals below are guided by this.

### Node

The class-based Nodes are one of the key decisions behind Lexical. They perform well, they are type friendly and they can be overwritten safely.

Unfortunately, Nodes are overused now. Nodes used to represent the data (we intentionally excluded any reference to `editor`) and wrap simple configuration that either events and reconciler needed to know, but now they are responsible for too much to the point that you can build a full-fledged plugin in the Node itself.

There are opportunities for us to split the node responsibilities which not only will help devX but also **reduce the budget size**.

What should Nodes responsibilities be?

1. Data, getters and setters. These are used by the core to build the EditorState.
2. Configuration that gives the final shape to the Node and that the editor needs to know at all times, such as `canInsertBefore` or `insertNewAfter` that are needed to understand how to do the basics of text insertion and caret (see Read-only and lazy loaded Nodes section).
3. Renderer. The reconciler needs to know how to render these Nodes, and this can happen as soon as the page is loaded and populated by a third-party source. While I recommend editable functions decoupling for read-only (see Read-only and lazy loaded Nodes section) I don't think there's obvious benefits for optimizing the size of headless mode which often runs on the server or is followed by a non-headless editor.

What should Nodes yield the responsibility for?

1. Static transforms. We don't want users to build a full-fledged plugin inside the Node itself. It is not feasible to build most plugins this way, for example, plugins that have UI attached like CharacterLimit or plugins that depend on multiple nodes like Hashtag. This wildcard transform function inside the Node causes unnecessary overhead for developers to find where the code lives. We can improve the UX with a simple tweak (see Normalizer section).
2. Import and export DOM functions. There is only a subset of Lexical that uses clipboard HTML so we don't have to bundle this heavy part into the core (see Revised EditorConfig section).
3. Utility functions. It is convenient to have all functions listed underneath the class name but this introduces two risks, 1) an easy path for users to misuse them, via overrides, and 2) causes the bundle size to increase unnecessarily as we don't tree shake Node props. But it's also fair to say that we want to mimic the DOMElements API and provide the flexibility and reusability aspect that classes give us. However, there are methods that provide little value underneath Node, for example `getCommonAncestor`, with no usages in the playground, that should be moved inside @lexical/utils or `getCordsFromCellNode` that should either be reshaped inside TableCell or moved into the @lexical/table module. As a rule of thumb, if the method is only useful in certain environments, it's probably best as a utility.

Grey area:

1. Multi-purpose, heavily used functions such as `getChildren`, `insertAfter` or `remove`. Mimicking DOMElements, convenience, (used by the reconciler), and the fact that isolating them wouldn't yield obvious bundle size wins justifies having them inside Node.

### A plugin

Plugins fulfill the feature behavior by leveraging Nodes. They are independent code that listen to the Lexical lifecycle and commands to manipulate the EditorState.

The heavy use of React in this repo already helped us shape plugins as standalone units that are easy to identify:

```
function MyPlugin(): JSX.Element {
  useEffect(() => {
    return editor.registerNodeTransform(...);
  }, []);

  return <SomeUI />
}
```

The responsibility for plugins is to coordinate the CRUD of these Nodes, and the fact that they work like libraries comes with many advantages:

1. Plugins understand the product lifecycle, they can tell when the UI mounts and dismounts and can communicate with their state management system.
2. They can render UI beyond the editor itself. For example, the character count on the CharacterLimit plugin or the table of contents inside TableOfContents plugin.
3. They can be lazy loaded, and they're not strictly required on editor load.

Given how effective this React encapsulation has proven to be so far, my proposal is to formalize the concept:

```
function MyPlugin(): JSX.Element {
  useEffect(() => {
    return editor.registerPlugin('MyPlugin', [
      editor.registerNodeTransform(...),
      editor.registerNodeTransform(...),
    ], [TableNode, TableRow, TableCellNode]);
  }, [...]);

  return <SomeUI />
}
```

The idea is that the plugin registration allows us for easier debugging and devX, where we pass the plugin identifier and optionally the Nodes it depends on for an onload runtime check. To optimize for cases where different logical units are required (i.e. different dependency array in React), the same plugin can be registered more than once.

### Read-only and lazy loaded Nodes

A full-fledged editor can be heavy, lazy loading the plugins is only half of the equation, the other are the nodes. A Node like TableNode imports LexicalTableSelectionHelpers with 1.7k lines of code.

Nodes have to be loaded before we can render the EditorState but a method like `remove` doesn't play any role in the bootstrap.

My proposal is to decouple the read and edit functionality of the Nodes, where the edit version is a superset of the read-only version. This also allows us to ultimately build a read-only version of the editor, which can be useful for surfaces where we only use to editor to display the content.

The editor will provide hooks to provide the complete version of the node and the first update will be blocked until these nodes are provided (or at least the relevant Nodes).

This is worth its own issue so I'll cut it here.

### Normalizer

By design, Lexical is resilient, functions like `insertNodes` are very versatile and even in the worst case scenario, a crash, the update flow will automatically roll back to the last good version (only a minority of events cause the editor to throw like two consecutive crashes or DecoratorNodes).

This was very well received internally, as misbehavior from our core code or product didn't lead to data loss.

However, versatility can be problematic, an invalid EditorState can cause certain plugins to misbehave indefinitely. This is why we started #3833 to discuss the idea of Schemas that other libraries like Prosemirror already come with.

To fix this problem, we want to introduce Normalizers (this is not a new idea, @GermanJablo, @acywatson, and others already explored it in #3833). The reason why I'm now convinced they are a good fit is because they respect the efforts on resiliency while addressing the invalid EditorState problem.

```
createEditor({
  normalizers: [
    createNodeNormalizer('TableRow', () => {
      // Revise children count of TableCellNode
    }),
  ]
})
```

Normalizers enforce document constraints to guarantee EditorState consistency, **they can be seen as runtime rules with a fixer**, (I named them `nodeNormalizer` because they more naturally within the Lexical API). They sit in between Nodes and Plugins, they do not need a plugin to run but at the same time they are not necessarily a 1:1 map with the Node (a TextNode is likely to be used in combination for many). Normalizers for custom product rules are reasonable, for example, prevent the use of HashtagNode within HeaderNode.

Normalizers are `$` functions, similar to transforms, that can look at any part of tree and make the appropriate changes. They do, however, come with special rules:

1. They warn if they do any changes to the EditorState. While we know that clipboard input may be out of our control, this can be useful to catch bad plugins.
2. You choose when to run them. Complex transforms can be expensive if run too often, by default Normalizers will run on the immediate update and after transforms. We understand that most of these inconsistencies are originated from clipboard and collab, the external sources.
3. Normalizers can run before any transforms and as often as after every transform. This option is provided to ensure consistency in the transforms logic, **which can lead to crashes**, but may not efficient when we consider that transforms already loop based on dirty Nodes.
4. Normalizers never run on the bootstrap update nor in non-editable mode.

Normalizers replace the Node `static transform`, are optional, and can be lazy loaded in a similar fashion as described above.

### Revised EditorConfig

Going back to "Nodes should yield the responsibility for" in the Node section, the only point we didn't cover yet is import/exportDOM.

These will be imported on `createEditor`, just like normalizers. This reinforces the modularity aspect of Lexical and makes it possible to have a much lighter version of a plain text editor. #6020 

@lexical/rich-text is a decent fit, especially when plain text does not do HTML, but 1) it would introduce a dependency to a very clean module that works very closely together with the core and 2) I'm convinced that sooner or later @lexical/plain-text will also understand some HTML (which would unblock the copy-paste of mentions internally)._

```
createEditor({
  nodes: [
    TableNode,
    TableRow,
    TableCell,
  ],
  html: [
    ...tableHtml,
  ],
  normalizers: [
    ...tableNormalizers,
  ]
})
```

Or via the builder

```
createEditorBuilder({
  onError: ...,
  ...
})
  .addDependencies(TablePluginDependencies)
  .build();
```

Note that there is no 1:1 mapping between HTML and Node, but rather an HTML element can map to different Nodes, the first example reinforces this. The second is merely for convenience so that package owners can bundle all their logic together.

### What about #6025

Yes, #6025 would need to be rewritten.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Plugins, normalizers, and nodes #6066

[RFC] Plugins, normalizers, and nodes

Multiple ways to hit the same goal

Node

A plugin

Read-only and lazy loaded Nodes

Normalizer

Revised EditorConfig

What about #6025

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Plugins, normalizers, and nodes #6066

Description

[RFC] Plugins, normalizers, and nodes

Multiple ways to hit the same goal

Node

A plugin

Read-only and lazy loaded Nodes

Normalizer

Revised EditorConfig

What about #6025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions