Skip to content

fix: ensure navigation sidebar serves fresh data after course publish#38785

Open
wgu-taylor-payne wants to merge 1 commit into
openedx:masterfrom
WGU-Open-edX:fix/stale-navigation-sidebar
Open

fix: ensure navigation sidebar serves fresh data after course publish#38785
wgu-taylor-payne wants to merge 1 commit into
openedx:masterfrom
WGU-Open-edX:fix/stale-navigation-sidebar

Conversation

@wgu-taylor-payne

@wgu-taylor-payne wgu-taylor-payne commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Summary

After a course publish, there is a ~30-second window where the navigation sidebar endpoint serves stale block structure data. Worse, the stale response is cached for 1 hour, extending the staleness far beyond the initial window.

This PR adds a synchronous staleness check before reading the block structure on a navigation cache miss.

Problem

This issue was brought to light while testing an internal Open edX instance. A unit was deleted in Studio, a refresh in the course in the learning MFE still showed the unit in the course outline.

In a Verawood sandbox, I deleted a unit in Studio, waited 30+ seconds and then refreshed and the unit was removed from the outline in the learning MFE. I deleted another unit in Studio and refreshed the course page within a few seconds, and the deleted unit still showed. Refreshing again before an hour passed, it still showed. Refreshing after an hour, it was no longer present in the outline.

Three components interact to create a race condition:

1. Block structure rebuild is delayed 30s after publish

The course_published signal handler queues the rebuild task with a 30s countdown:

update_course_in_cache_v2.apply_async(
    kwargs=dict(course_id=str(course_key)),
    countdown=settings.BLOCK_STRUCTURES_SETTINGS['COURSE_PUBLISH_TASK_DELAY'],
)

2. Navigation cache key includes course version — causing a miss on publish

The cache key uses course_version, so after a publish the version changes and the previous cached response no longer matches. This is a cache miss.

3. On cache miss, stale block structure data is read and cached for 1 hour

The if not course_blocks branch calls get_course_outline_block_tree(), which reads from the (still stale) block structure cache, then stores the result for 1 hour.

Timeline:

  1. T+0s: Course published — new version, block structure rebuild queued with 30s delay
  2. T+1s: Learner hits navigation → cache miss (new version) → reads stale block structure → caches stale data for 1 hour
  3. T+30s: Rebuild task runs, block structure updated (but navigation cache already poisoned)
  4. T+1s → T+1h: All requests served stale data from navigation cache

Fix

Before reading the block structure on a cache miss, call update_collected_if_needed(). This compares the cached block structure version against the modulestore version and synchronously rebuilds only if stale.

Performance impact

  • Cache hit (common case): No change.
  • Cache miss, block structure current: One lightweight is_up_to_date check (DB read + version comparison).
  • Cache miss, block structure stale (the bug): Synchronous rebuild — same work the delayed Celery task would do, just done eagerly instead of serving stale data.

Testing

To run the automated test:

tutor dev run lms pytest --ds=lms.envs.test \
  lms/djangoapps/course_home_api/outline/tests/test_view.py::SidebarBlocksTestViews::test_navigation_serves_fresh_data_after_publish

Manual steps:

  • In a production like environment (i.e. Celery used to process async tasks)
  • Delete a unit in Studio
  • Refresh the course in the learning MFE within 30 seconds
  • Verify that the unit is no longer listed in the course outline

AI Usage

Used Kiro with model set to auto to aid in the discovery of the root cause of the error, talk through potential fixes, come up with a test that addresses the issue being fixed, and write this PR summary.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Jun 19, 2026
@openedx-webhooks

Copy link
Copy Markdown

Thanks for the pull request, @wgu-taylor-payne!

This repository is currently maintained by @openedx/wg-maintenance-openedx-platform.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.

Details
Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the core contributor PR author is a Core Contributor (who may or may not have write access to this repo). label Jun 19, 2026
@github-project-automation github-project-automation Bot moved this to Needs Triage in Contributions Jun 19, 2026
After a course publish in Studio, the CourseNavigationBlocksView can
cache stale block structure data for up to 1 hour. This happens because
the block structure rebuild task runs with a 30-second delay, but the
navigation view may be hit during that window, read the old block
structure from its cache, and store the stale result under the new
course_version key.

The fix adds an update_collected_if_needed() call on cache miss,
ensuring the block structure is fresh before we build and cache the
navigation tree. This only runs on cache misses and adds negligible
overhead for the common case (block structure already up-to-date).
@wgu-taylor-payne wgu-taylor-payne force-pushed the fix/stale-navigation-sidebar branch from 5103ff8 to 4374146 Compare June 19, 2026 03:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core contributor PR author is a Core Contributor (who may or may not have write access to this repo). open-source-contribution PR author is not from Axim or 2U

Projects

Status: Needs Triage

Development

Successfully merging this pull request may close these issues.

2 participants