fix: parse YouTube source URLs from notebook metadata by voidborne-d · Pull Request #266 · teng-lin/notebooklm-py

voidborne-d · 2026-04-11T04:54:38Z

Summary

add a shared helper for source URL extraction across source parsing paths
read YouTube URLs from metadata[5][0], while keeping the existing web/PDF and flat fallbacks
add a regression test covering SourcesAPI.list() for YouTube metadata

Testing

PYTHONPATH=src python3 -m pytest tests/unit/test_source_status.py -k youtube_metadata_slot

Closes #265

Summary by CodeRabbit

New Features
- Mind-map generation supports custom language selection (default: English) and optional instructions/description; CLI adds a positional description and a --language option.
Bug Fixes
- Improved source URL extraction across varied metadata shapes for more reliable source parsing.
Tests
- Added unit tests verifying language/instructions are passed to mind-map generation and robust YouTube/source parsing.

coderabbitai · 2026-04-11T04:54:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 508ff8bd-1baa-42e8-8acb-5d05df8ab8d2

📥 Commits

Reviewing files that changed from the base of the PR and between 1607672 and 29e49df.

📒 Files selected for processing (2)

src/notebooklm/types.py
tests/unit/test_source_status.py

🚧 Files skipped from review as they are similar to previous changes (1)

tests/unit/test_source_status.py

📝 Walkthrough

Walkthrough

This PR adds language and instruction parameters to mind-map generation (exposed in the CLI) and centralizes source URL extraction to correctly handle YouTube metadata, with tests covering both changes.

Changes

Cohort / File(s)	Summary
Mind-map generation `src/notebooklm/_artifacts.py`, `src/notebooklm/cli/generate.py`	Added `language` (default `"en"`) and `instructions` (optional) to `ArtifactsAPI.generate_mind_map()`; CLI `generate mind-map` gains a positional `description` and `--language` option and now passes resolved language and instructions into the RPC payload.
URL extraction refactor `src/notebooklm/types.py`, `src/notebooklm/_sources.py`	Introduced `_extract_source_url()` to centralize metadata parsing (checks index 7, index 5 for YouTube, then index 0 fallback) and updated `SourcesAPI.list()` and `Source.from_api_response()` to use it instead of inline index-based extraction.
Tests `tests/unit/cli/test_generate.py`, `tests/unit/test_source_selection.py`, `tests/unit/test_source_status.py`	Added unit tests: CLI passes `--language` to `generate_mind_map`; `ArtifactsAPI.generate_mind_map` includes `language` and `instructions` in RPC payload; tests covering YouTube source URL extraction and `SourcesAPI.list()` parsing.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant Client as Client (sync/async)
    participant Artifacts as ArtifactsAPI
    participant Core as Core RPC
    CLI->>Client: invoke generate mind-map (description, --language)
    Client->>Artifacts: artifacts.generate_mind_map(notebook_id, source_ids, language, instructions)
    Artifacts->>Core: rpc_call("GENERATE_MINDMAP", ... , config_with_language_and_context)
    Core->>Backend: process mind-map request
    Backend-->>Core: mind-map result
    Core-->>Artifacts: rpc response
    Artifacts-->>Client: returns result
    Client-->>CLI: output result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related issues

Mind map generation ignores language and instructions parameters #250 — Implements adding language and instructions parameters to mind-map generation and CLI support, matching the issue objective.

Poem

🐰 I hop through code with a joyful cheer,
Languages set and instructions clear.
YouTube links found where they hide,
Mind-maps bloom with context wide. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	The PR includes changes to mind-map generation (language/instructions parameters in CLI and API) that are unrelated to the YouTube URL parsing objective from issue `#265`.	Remove the mind-map generation enhancements (changes to generate_mind_map in _artifacts.py, CLI, and related tests) as they are outside the scope of fixing YouTube source URL parsing.
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: parse YouTube source URLs from notebook metadata' accurately summarizes the primary change—adding YouTube URL extraction logic to the source parsing paths.
Linked Issues check	✅ Passed	The PR successfully addresses issue `#265` by implementing YouTube URL extraction at src[2][5] in all required parsing paths (_sources.py list() and types.py from_api_response), with tests validating the fix.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/unit/test_source_status.py`:
- Around line 235-265: Add a focused unit test that directly exercises
Source.from_api_response to ensure YouTube URL extraction from metadata[5][0] is
correct: construct a minimal API response object matching the shape used in
tests (with metadata slot index 5 containing ["https://youtube.com/watch?v=abc",
"abc", "Channel"]), call Source.from_api_response(...) (or the
classmethod/constructor used) and assert the returned Source instance has url ==
"https://youtube.com/watch?v=abc" and kind == "youtube"; reference
Source.from_api_response and the Source class to locate the code to test and
reuse the same SourceStatus/structure as other tests for consistency.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 120d247a-573a-4bf4-968f-4441b978774d

📥 Commits

Reviewing files that changed from the base of the PR and between a997718 and 1607672.

📒 Files selected for processing (7)

src/notebooklm/_artifacts.py
src/notebooklm/_sources.py
src/notebooklm/cli/generate.py
src/notebooklm/types.py
tests/unit/cli/test_generate.py
tests/unit/test_source_selection.py
tests/unit/test_source_status.py

tests/unit/test_source_status.py

gemini-code-assist

Code Review

This pull request enhances the mind map generation functionality by adding support for custom instructions and language selection across the API and CLI. It also refactors source URL extraction into a centralized helper function, _extract_source_url, to consistently handle different metadata structures for web, PDF, and YouTube sources. Corresponding unit tests have been added to verify the new parameters and the improved URL parsing logic. I have no feedback to provide as there were no review comments to assess.

voidborne added 3 commits April 10, 2026 18:54

feat(generate): add language and instructions to mind maps

885b1ec

fix(cli): satisfy formatting and simplify mind-map args

a3f95c5

fix(sources): parse youtube urls from notebook metadata

1607672

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

tests/unit/test_source_status.py Show resolved Hide resolved

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

test(sources): cover medium-nested youtube metadata parsing

29e49df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: parse YouTube source URLs from notebook metadata#266

fix: parse YouTube source URLs from notebook metadata#266
voidborne-d wants to merge 4 commits intoteng-lin:mainfrom
voidborne-d:fix/youtube-source-url-parsing

voidborne-d commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 11, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

voidborne-d commented Apr 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

voidborne-d commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 11, 2026 •

edited

Loading