Skip to content

Canonical URL generation logic can lead to incorrect mirror results #6112

@tfoote

Description

@tfoote

The logic is here:

ros2_documentation/conf.py

Lines 215 to 272 in 2e05f75

redirect_html_fragment = """
<link rel="canonical" href="{base_url}/{url}" />
<meta http-equiv="refresh" content="0; url={url}" />
<script>
window.location.href = '{url}';
</script>
"""
redirections = {
os.path.splitext(os.path.relpath(
document_path, app.srcdir
))[0]: redirect_urls
for document_path, redirect_urls in cls.redirections.items()
}
redirection_conflict = next((
(canon_1, canon_2, redirs_1.intersection(redirs_2))
for (canon_1, redirs_1), (canon_2, redirs_2)
in itertools.combinations(redirections.items(), 2)
if redirs_1.intersection(redirs_2)
), None)
if redirection_conflict:
canonical_url_1, canonical_url_2 = redirection_conflict[:2]
conflicting_redirect_urls = redirection_conflict[-1]
raise RuntimeError(
'Documents {} and {} define conflicting redirects: {}'.format(
canonical_url_1, canonical_url_2, conflicting_redirect_urls
)
)
all_canonical_urls = set(redirections.keys())
all_redirect_urls = {
redirect_url
for redirect_urls in redirections.values()
for redirect_url in redirect_urls
}
conflicting_urls = all_canonical_urls.intersection(all_redirect_urls)
if conflicting_urls:
raise RuntimeError(
'Some redirects conflict with existing documents: {}'.format(
conflicting_urls
)
)
for canonical_url, redirect_urls in redirections.items():
for redirect_url in redirect_urls:
context = {
'canonical_url': os.path.relpath(
canonical_url, redirect_url
),
# Skip entry into sitemap.xml with reason 'redirect'.
'skip_sitemap': 'redirect',
'title': os.path.basename(redirect_url),
'metatags': redirect_html_fragment.format(
base_url=app.config.html_baseurl,
url=app.builder.get_relative_uri(
redirect_url, canonical_url
)
)
}
yield (redirect_url, context, cls.template_name)

In searching I found https://ros.ncnynl.com/en/humble/How-To-Guides/Using-Custom-Rosdistro.html is ranking above our official docs. It has the generated canonical link. However it's missing the rosdistro prefix.
<link rel="canonical" href="https://docs.ros.org/en/How-To-Guides/Using-Custom-Rosdistro.html" />

Which 404s because the correct link is https://docs.ros.org/en/humble/How-To-Guides/Using-Custom-Rosdistro.html

It's missing the humble prefix. The content at https://ros.ncnynl.com/en/humble/How-To-Guides/Using-Custom-Rosdistro.html is in the humble namespace, but they've clearly configured it differently than we have. Likely without the multi-version building, Which in turn generates the wrong canonical url.

I'm going to reach out to them if I can. But it would be good to improve our logic to make sure this canonical link is generated correctly for their approach too.

Reference search results:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions