Skip to content

fix: migration overwrites stale profile cookies with fresh login data#247

Open
LittleBitPlanet wants to merge 1 commit intoteng-lin:mainfrom
LittleBitPlanet:fix/migration-stale-overwrite
Open

fix: migration overwrites stale profile cookies with fresh login data#247
LittleBitPlanet wants to merge 1 commit intoteng-lin:mainfrom
LittleBitPlanet:fix/migration-stale-overwrite

Conversation

@LittleBitPlanet
Copy link
Copy Markdown

@LittleBitPlanet LittleBitPlanet commented Apr 5, 2026

Summary

  • Migration from legacy flat layout to profiles/default/ skipped copying storage_state.json when the destination already existed — regardless of whether the source was newer
  • This caused a recurring auth failure cycle: login writes fresh cookies to the legacy root path, migration deletes the fresh file but keeps the stale profile copy, auth fails
  • Fix: compare st_mtime before skipping — if the legacy root file is newer, overwrite the profile copy

Root Cause

_legacy_fallback() in paths.py resolves get_storage_path() to the root ~/.notebooklm/storage_state.json when it exists (for backwards compat). So login writes there. But ensure_profiles_dir() runs on every CLI invocation and triggers migrate_to_profiles() whenever legacy files exist at root. The migration copied root → profile on first run, but on subsequent runs it saw the profile copy already existed and skipped the copy — then deleted the (newer) root file anyway.

The original comment even said "skip if destination already exists and is newer" but the and is newer part was never implemented in the condition.

Test plan

  • Verify ruff format, ruff check, mypy pass (confirmed locally)
  • Scenario: fresh login → next CLI command → auth still works (fresh cookies preserved in profile)
  • Scenario: profile copy is already up-to-date → migration skips correctly (no unnecessary overwrites)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved profile migration to compare file modification times and properly update files by copying newer versions over older ones.

The migration from legacy flat layout to profiles/ skipped copying when
the destination file already existed, regardless of timestamps. This
caused a recurring auth failure cycle:

1. `notebooklm login` writes fresh cookies to the legacy root path
   (because _legacy_fallback resolves there when the root file exists)
2. Next CLI command triggers ensure_profiles_dir() → migrate_to_profiles()
3. Migration sees profiles/default/storage_state.json already exists,
   skips the copy, then deletes the fresh root file
4. The stale profile copy (from a prior migration) is now the only auth
   source → auth fails

Fix: compare st_mtime before skipping. If the legacy root file is newer
than the profile copy, overwrite it. This matches the original comment
intent ("skip if destination already exists and is newer") which was
never implemented in the condition.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

The migration logic in src/notebooklm/migration.py was modified to compare modification timestamps when deciding whether to overwrite legacy files. Previously, the code skipped copying if a destination file already existed; now it copies files only if the source is newer than the destination.

Changes

Cohort / File(s) Summary
Migration file copy behavior
src/notebooklm/migration.py
Modified legacy file copy logic during profile migration to use modification time comparison (st_mtime) instead of checking file existence, enabling overwrite of older destination files with newer source files. Updated corresponding log message.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 Files migrate with time in mind,
No longer blocked by what we find.
Newer trumps the old each way,
Migration logic saves the day! ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly and clearly describes the main fix: modifying migration logic to overwrite stale cookies with fresh login data, which matches the core change in the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the migration logic in src/notebooklm/migration.py to overwrite destination files if the source file is newer, ensuring that fresh data written to legacy paths is correctly migrated. I have no feedback to provide.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/notebooklm/migration.py (1)

82-87: Please add explicit tests for both mtime branches.

The new condition introduces two critical paths (dst >= src skip, src > dst overwrite) that are not directly asserted in current migration tests. Add focused cases to prevent regressions in this auth-critical flow.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/migration.py` around lines 82 - 87, Add two focused unit tests
that exercise the file-copy mtime branches in migration.py's loop over
legacy_files: (1) create a legacy source file and a destination file where
dst.stat().st_mtime >= src.stat().st_mtime, run the migration routine that
iterates legacy_files, and assert the destination file was not overwritten and
the logger emitted the "Skipping %s (profile copy is same age or newer)"
message; (2) create a legacy source file and a destination file where
src.stat().st_mtime > dst.stat().st_mtime, run the same migration routine, and
assert the destination was overwritten (content changed) by the copy. Use
tmp_path (or tempfile) and os.utime to set mtimes deterministically, locate the
files used by the migration via the same path logic that computes dst =
default_dir / src.name, and verify behavior for the legacy_files -> dst copy
branch governed by the dst.exists() and mtime comparison (dst.stat().st_mtime >=
src.stat().st_mtime).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/notebooklm/migration.py`:
- Around line 82-87: Add two focused unit tests that exercise the file-copy
mtime branches in migration.py's loop over legacy_files: (1) create a legacy
source file and a destination file where dst.stat().st_mtime >=
src.stat().st_mtime, run the migration routine that iterates legacy_files, and
assert the destination file was not overwritten and the logger emitted the
"Skipping %s (profile copy is same age or newer)" message; (2) create a legacy
source file and a destination file where src.stat().st_mtime >
dst.stat().st_mtime, run the same migration routine, and assert the destination
was overwritten (content changed) by the copy. Use tmp_path (or tempfile) and
os.utime to set mtimes deterministically, locate the files used by the migration
via the same path logic that computes dst = default_dir / src.name, and verify
behavior for the legacy_files -> dst copy branch governed by the dst.exists()
and mtime comparison (dst.stat().st_mtime >= src.stat().st_mtime).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: dfdd0822-04f3-48a2-bc81-3cb3288a81f2

📥 Commits

Reviewing files that changed from the base of the PR and between abeae92 and 72f8fb2.

📒 Files selected for processing (1)
  • src/notebooklm/migration.py

Copy link
Copy Markdown
Owner

@teng-lin teng-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix, @LittleBitPlanet! The root cause analysis in the PR description is excellent — the comment/code mismatch where "skip if destination already exists and is newer" was never actually implemented is a great catch.

The fix itself is correct and minimal: >= with shutil.copy2's mtime preservation gives you idempotency for free. All 17 CI checks pass and the code reads cleanly.

Multi-model review summary

I ran this through several review passes (including error-handling, test-coverage, and type-design analysis). Here's the consensus:

✅ Strengths

  • Fix implements what the original comment always intended — clean, minimal 4-line diff
  • >= comparison correctly handles idempotency via shutil.copy2's mtime preservation
  • Excellent PR description with clear root cause analysis

📝 Recommendation: add mtime tests before merge

The PR changes a branch condition but doesn't add a test that would fail if the change were reverted. Two focused tests using os.utime would lock in the fix:

def test_overwrites_when_source_is_newer(self, tmp_path):
    """Source file newer than profile copy triggers overwrite (the bug fix)."""
    default_dir = tmp_path / "profiles" / "default"
    default_dir.mkdir(parents=True)
    dst = default_dir / "storage_state.json"
    dst.write_text('{"cookies": ["old"]}')
    os.utime(dst, (1_000_000, 1_000_000))

    src = tmp_path / "storage_state.json"
    src.write_text('{"cookies": ["fresh"]}')
    os.utime(src, (2_000_000, 2_000_000))

    with patch.dict(os.environ, {"NOTEBOOKLM_HOME": str(tmp_path)}, clear=True):
        migrate_to_profiles()
    assert json.loads(dst.read_text()) == {"cookies": ["fresh"]}


def test_skips_when_destination_is_newer(self, tmp_path):
    """Profile copy newer than legacy source is preserved."""
    default_dir = tmp_path / "profiles" / "default"
    default_dir.mkdir(parents=True)
    src = tmp_path / "storage_state.json"
    src.write_text('{"cookies": ["stale"]}')
    os.utime(src, (1_000_000, 1_000_000))

    dst = default_dir / "storage_state.json"
    dst.write_text('{"cookies": ["current"]}')
    os.utime(dst, (2_000_000, 2_000_000))

    with patch.dict(os.environ, {"NOTEBOOKLM_HOME": str(tmp_path)}, clear=True):
        migrate_to_profiles()
    assert json.loads(dst.read_text()) == {"cookies": ["current"]}

💡 Minor observations (not blocking)

  1. Directory migration inconsistency — Lines 94-101 still use the old if dst.exists(): skip pattern for browser_profile/. If browser profile data can be regenerated at the legacy root after a previous migration, the same stale-overwrite issue could apply. Might be worth a follow-up issue.

  2. FAT32 mtime granularity — On FAT32/exFAT, mtime has 2-second precision, so a login within the same 2-second window as a previous copy could produce a false tie. Extremely unlikely in practice but a brief code comment would be nice.

Overall this is a solid, well-motivated fix. Just the two tests to add and it's good to go. 🎉

🤖 Generated with Claude Code

@teng-lin
Copy link
Copy Markdown
Owner

teng-lin commented Apr 5, 2026

Follow-up: deeper investigation

After a more thorough investigation of the code flow, I want to flag some concerns for @teng-lin's consideration.

The described scenario may be unreachable

The PR describes a "recurring auth failure cycle" where login writes fresh cookies to the legacy root path via _legacy_fallback(), then migration deletes the fresh file. However, tracing the actual code paths:

  1. _legacy_fallback() (paths.py:232) returns the root path only when not profile_path.exists() and resolved_profile == "default" and the root path exists
  2. After a successful migration, the root file is deleted and the profile file exists
  3. get_storage_path()_legacy_fallback() → profile path exists → returns profile path
  4. login (session.py:210, 330) writes to get_storage_path() → writes to profile path, not root
  5. I checked all write paths (context.storage_state() in Playwright login, storage_path.write_text() in --browser-cookies) — all go through get_storage_path()_legacy_fallback() → profile path when it exists

There is no code path in this codebase that recreates storage_state.json at the root after a successful migration. For the described scenario to occur, both root and profile copies would need to exist simultaneously with root being newer — but login always writes to wherever get_storage_path() points, which is the profile path once it exists.

The only ways I can see this triggering are:

  • Manual/external creation of ~/.notebooklm/storage_state.json by a user or external tool
  • A crash during migration (but then both files have identical content from shutil.copy2)

The fix itself is correct but the narrative may be overstated

The code change is harmless and technically sound — implementing the mtime check that the original comment always described. The >= comparison with shutil.copy2's mtime preservation is correct for idempotency. It makes the migration more robust against edge cases involving external file manipulation.

However, the "recurring auth failure cycle" framing suggests a critical production bug, when the actual impact appears limited to scenarios involving external file creation outside this tool's control.

Contributor investigation

The contributor account (created 2026-01-21) has no prior activity. The fork → branch → PR was completed in ~61 seconds. The commit is co-authored with "Claude Opus 4.6 (1M context)". This appears to be an AI-generated contribution.

Recommendation

The fix is correct and harmless — I'd still accept it (with the mtime tests I suggested earlier), but wanted to flag these findings for transparency. @teng-lin, you're the best judge of whether there's a scenario I'm missing.

🤖 Generated with Claude Code

@teng-lin teng-lin added the bot-generated Likely AI/bot-generated contribution label Apr 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bot-generated Likely AI/bot-generated contribution

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants