Skip to content

Use indexables for retrieving posts for the llms.txt file #22327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: release/25.4
Choose a base branch
from

Conversation

leonidasmi
Copy link
Contributor

@leonidasmi leonidasmi commented Jun 3, 2025

Context

Summary

This PR can be summarized in the following changelog entry:

  • Adds support for preventing specifically noindexed posts from getting into the llms.txt file.
  • [wordpress-seo other] Improves the internal engine that creates the post lists in the llms.txt file for a more efficient and lighter generation.
  • [wordpress-seo non-user-facing] Makes the indexable table as the source of truth for populating the llms.txt file with posts/pages/CPTs. This has impact on websites with disabled indexables or where the SEO optimization has been completed.

Relevant technical choices:

Test instructions

Test instructions for the acceptance test before the PR gets merged

This PR can be acceptance tested by following these steps:

Test preventing specifically noindexed posts
Test generating the llms.txt file where your most recently modified post is:

  • public, published, not password protected and its Allow search engines to show this content in search results setting is set to yes (or the default)
    • the post should be at the top of the ## Posts section
  • password protected or private
    • the post should not be at the llmst.txt file
  • draft or pending review or scheduled
    • the post should not be at the llmst.txt file
  • its Allow search engines to show this content in search results setting is set to no
    • the post should not be at the llmst.txt file
  • Repeat the test, but instead of editing your most recently modified post, edit your one cornerstone post and the results should be the same.

Test impact of the PR making indexables the source of truth:

  • Reset indexables in your site and disable them (using the add_filter( 'Yoast\WP\SEO\should_index_indexables', '__return_false' ); filter)
  • Your site should have no cornernstone content or noindex posts
  • Generate the llms.txt file and confirm that the post list is what you expect, meaning it contains the most recently modified posts
  • Remove the snippet that disables indexable but dont do anything else
  • Load a post in the frontend
  • Regenerate the llms.txt file and confirm that the only post you see in the llms.txt file is the post you loaded in the frontend
  • Now run the SEO optimization and re-generate the llms.txt file
  • Confirm that the file is the same with how it was when you created when indexables were disabled.
  • Now repeat the above set of tests, but:
    • have a post that's noindexed. Make sure that post the most recently modified one.
    • have a post that's cornerstone. Make sure that post is NOT in the 5 most recently modified ones.
    • The difference between the llms.txt file when indexables are disabled and not is:
      • the noindexed post exists in the former but not in the latter
      • the cornerstone post exists in the latter but not in the former

Relevant test scenarios

  • Changes should be tested with the browser console open
  • Changes should be tested on different posts/pages/taxonomies/custom post types/custom taxonomies
  • Changes should be tested on different editors (Default Block/Gutenberg/Classic/Elementor/other)
  • Changes should be tested on different browsers
  • Changes should be tested on multisite

Test instructions for QA when the code is in the RC

  • QA should use the same steps as above.

Impact check

This PR affects the following parts of the plugin, which may require extra testing:

UI changes

  • This PR changes the UI in the plugin. I have added the 'UI change' label to this PR.

Other environments

  • This PR also affects Shopify. I have added a changelog entry starting with [shopify-seo], added test instructions for Shopify and attached the Shopify label to this PR.

Documentation

  • I have written documentation for this change. For example, comments in the Relevant technical choices, comments in the code, documentation on Confluence / shared Google Drive / Yoast developer portal, or other.

Quality assurance

  • I have tested this code to the best of my abilities.
  • During testing, I had activated all plugins that Yoast SEO provides integrations for.
  • I have added unit tests to verify the code works as intended.
  • If any part of the code is behind a feature flag, my test instructions also cover cases where the feature flag is switched off.
  • I have written this PR in accordance with my team's definition of done.
  • I have checked that the base branch is correctly set.

Innovation

  • No innovation project is applicable for this PR.
  • This PR falls under an innovation project. I have attached the innovation label.
  • I have added my hours to the WBSO document.

Fixes https://github.com/Yoast/reserved-tasks/issues/613

@leonidasmi leonidasmi added the changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog label Jun 3, 2025
@leonidasmi leonidasmi added this to the 25.3 milestone Jun 3, 2025
@leonidasmi leonidasmi removed this from the 25.3 milestone Jun 4, 2025
@leonidasmi leonidasmi changed the base branch from release/25.3 to trunk June 5, 2025 12:53
@leonidasmi leonidasmi added changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog and removed changelog: non-user-facing Needs to be included in the 'Non-userfacing' category in the changelog labels Jun 5, 2025
@leonidasmi leonidasmi force-pushed the 623-ignore-posts-that-are-noindexed-even-if-the-whole-cpt-is-index branch 5 times, most recently from 78e89b5 to b185889 Compare June 6, 2025 08:54
@leonidasmi leonidasmi force-pushed the 623-ignore-posts-that-are-noindexed-even-if-the-whole-cpt-is-index branch from b185889 to ca5f49f Compare June 6, 2025 09:04
@coveralls
Copy link

coveralls commented Jun 6, 2025

Pull Request Test Coverage Report for Build 73357a4236d45956f83bcf9af3a3df0566838ede

Details

  • 26 of 29 (89.66%) changed or added relevant lines in 2 files are covered.
  • 14 unchanged lines in 1 file lost coverage.
  • Overall coverage decreased (-0.5%) to 53.155%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/llms-txt/infrastructure/markdown-services/content-types-collector.php 14 17 82.35%
Files with Coverage Reduction New Missed Lines %
src/llms-txt/infrastructure/markdown-services/content-types-collector.php 14 75.71%
Totals Coverage Status
Change from base Build e037f43264d21cfcab31e2f9e49ec6eafd073ddc: -0.5%
Covered Lines: 29822
Relevant Lines: 57034

💛 - Coveralls

@leonidasmi leonidasmi changed the base branch from trunk to release/25.4 June 10, 2025 12:24
@leonidasmi leonidasmi marked this pull request as ready for review June 10, 2025 12:25
@leonidasmi leonidasmi added this to the 25.4 milestone Jun 10, 2025
Copy link
Contributor

@thijsoo thijsoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CR 1 suggestion 1 comment.

$exclude_old = false;

if ( $post_type === 'post' ) {
$exclude_old = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe rename this to exclude_older_than_one_year or something since that is what this does. I was very confused on what old meant in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha fair enough, added here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: enhancement Needs to be included in the 'Enhancements' category in the changelog
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants