-
Notifications
You must be signed in to change notification settings - Fork 46
Description
The current process for HTML indexing appears to only strip nav and footer elements:
node.remove if %w[footer nav].include?(node.name)
There is no great argument for restricting removal to only these tags that I see, and if we were to be pessimistic about the intention of these 2 from a generic perspective then we'd likely conclude to only strip nav. It's at least counterintuitive that footer is stripped but header isn't.
Given the current restrictions, if HTML content cannot mark an element with a main tag/role then we're restricted to an opt-out process where our opt-out tools are marking content with the aforementioned tags. nav is the only reasonable tag to use and is still not ideal since it can have implications when the enclosed section is repeated but not strictly related to navigation.
Therefore we should want capacity like the below:
- Define custom opt-out HTML attribute such as
data-no-sg-index(data-noindexis less verbose but on the off chance downstream already utilizes the attribute we can be circumspect) - Preferably, add
headerto the list of tags not indexed. It's not ideal to just wrap everything into anavsimply because we don't want it indexed. That has deeper implications in other domains such as screen reading software, etc.
Additionally, this document indexing process should be concretely laid out across the documentation. Despite that the main indexing doc details some specifics regarding tag sanitization, this part of the documentation makes the reader doubtful:
"We recommend adding <main> and other semantic elements such as <header>, <nav>, and <footer> to demarcate these sections and facilitate clean indexing."
The listing of header here implies it's in the same category of stripped content as nav and footer yet it is not. This is further suggested by the psuedocode examples below it:
<body>
Redundant header code and navigation elements, sidebars, etc.
The final changes that conclude this issue should align the docs with no ambiguity and at least provide some level of granular opt-out that only affects the indexing process.