Skip to content

fix: parse YouTube source URLs from notebook metadata#266

Open
voidborne-d wants to merge 4 commits intoteng-lin:mainfrom
voidborne-d:fix/youtube-source-url-parsing
Open

fix: parse YouTube source URLs from notebook metadata#266
voidborne-d wants to merge 4 commits intoteng-lin:mainfrom
voidborne-d:fix/youtube-source-url-parsing

Conversation

@voidborne-d
Copy link
Copy Markdown
Contributor

@voidborne-d voidborne-d commented Apr 11, 2026

Summary

  • add a shared helper for source URL extraction across source parsing paths
  • read YouTube URLs from metadata[5][0], while keeping the existing web/PDF and flat fallbacks
  • add a regression test covering SourcesAPI.list() for YouTube metadata

Testing

  • PYTHONPATH=src python3 -m pytest tests/unit/test_source_status.py -k youtube_metadata_slot

Closes #265

Summary by CodeRabbit

  • New Features

    • Mind-map generation supports custom language selection (default: English) and optional instructions/description; CLI adds a positional description and a --language option.
  • Bug Fixes

    • Improved source URL extraction across varied metadata shapes for more reliable source parsing.
  • Tests

    • Added unit tests verifying language/instructions are passed to mind-map generation and robust YouTube/source parsing.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 11, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 508ff8bd-1baa-42e8-8acb-5d05df8ab8d2

📥 Commits

Reviewing files that changed from the base of the PR and between 1607672 and 29e49df.

📒 Files selected for processing (2)
  • src/notebooklm/types.py
  • tests/unit/test_source_status.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_source_status.py

📝 Walkthrough

Walkthrough

This PR adds language and instruction parameters to mind-map generation (exposed in the CLI) and centralizes source URL extraction to correctly handle YouTube metadata, with tests covering both changes.

Changes

Cohort / File(s) Summary
Mind-map generation
src/notebooklm/_artifacts.py, src/notebooklm/cli/generate.py
Added language (default "en") and instructions (optional) to ArtifactsAPI.generate_mind_map(); CLI generate mind-map gains a positional description and --language option and now passes resolved language and instructions into the RPC payload.
URL extraction refactor
src/notebooklm/types.py, src/notebooklm/_sources.py
Introduced _extract_source_url() to centralize metadata parsing (checks index 7, index 5 for YouTube, then index 0 fallback) and updated SourcesAPI.list() and Source.from_api_response() to use it instead of inline index-based extraction.
Tests
tests/unit/cli/test_generate.py, tests/unit/test_source_selection.py, tests/unit/test_source_status.py
Added unit tests: CLI passes --language to generate_mind_map; ArtifactsAPI.generate_mind_map includes language and instructions in RPC payload; tests covering YouTube source URL extraction and SourcesAPI.list() parsing.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Client as Client (sync/async)
    participant Artifacts as ArtifactsAPI
    participant Core as Core RPC
    CLI->>Client: invoke generate mind-map (description, --language)
    Client->>Artifacts: artifacts.generate_mind_map(notebook_id, source_ids, language, instructions)
    Artifacts->>Core: rpc_call("GENERATE_MINDMAP", ... , config_with_language_and_context)
    Core->>Backend: process mind-map request
    Backend-->>Core: mind-map result
    Core-->>Artifacts: rpc response
    Artifacts-->>Client: returns result
    Client-->>CLI: output result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Poem

🐰 I hop through code with a joyful cheer,
Languages set and instructions clear.
YouTube links found where they hide,
Mind-maps bloom with context wide. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning The PR includes changes to mind-map generation (language/instructions parameters in CLI and API) that are unrelated to the YouTube URL parsing objective from issue #265. Remove the mind-map generation enhancements (changes to generate_mind_map in _artifacts.py, CLI, and related tests) as they are outside the scope of fixing YouTube source URL parsing.
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: parse YouTube source URLs from notebook metadata' accurately summarizes the primary change—adding YouTube URL extraction logic to the source parsing paths.
Linked Issues check ✅ Passed The PR successfully addresses issue #265 by implementing YouTube URL extraction at src[2][5] in all required parsing paths (_sources.py list() and types.py from_api_response), with tests validating the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_source_status.py`:
- Around line 235-265: Add a focused unit test that directly exercises
Source.from_api_response to ensure YouTube URL extraction from metadata[5][0] is
correct: construct a minimal API response object matching the shape used in
tests (with metadata slot index 5 containing ["https://youtube.com/watch?v=abc",
"abc", "Channel"]), call Source.from_api_response(...) (or the
classmethod/constructor used) and assert the returned Source instance has url ==
"https://youtube.com/watch?v=abc" and kind == "youtube"; reference
Source.from_api_response and the Source class to locate the code to test and
reuse the same SourceStatus/structure as other tests for consistency.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 120d247a-573a-4bf4-968f-4441b978774d

📥 Commits

Reviewing files that changed from the base of the PR and between a997718 and 1607672.

📒 Files selected for processing (7)
  • src/notebooklm/_artifacts.py
  • src/notebooklm/_sources.py
  • src/notebooklm/cli/generate.py
  • src/notebooklm/types.py
  • tests/unit/cli/test_generate.py
  • tests/unit/test_source_selection.py
  • tests/unit/test_source_status.py

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the mind map generation functionality by adding support for custom instructions and language selection across the API and CLI. It also refactors source URL extraction into a centralized helper function, _extract_source_url, to consistently handle different metadata structures for web, PDF, and YouTube sources. Corresponding unit tests have been added to verify the new parameters and the improved URL parsing logic. I have no feedback to provide as there were no review comments to assess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Source.url is always None for YouTube sources in list() and from_api_response()

2 participants