fix: extract YouTube source URL from src[2][5] in list() and from_api_response() by octo-patch · Pull Request #267 · teng-lin/notebooklm-py

octo-patch · 2026-04-11T13:35:39Z

Fixes #265

Problem

Source.url is always None for YouTube sources when retrieved via client.sources.list() or client.sources.get(). Web page and uploaded file sources return their URLs correctly.

The NotebookLM API stores YouTube URLs at src[2][5] as [url, video_id, channel_name], while web/PDF source URLs are stored at src[2][7]. The existing URL extraction logic only checked src[2][7], so YouTube source URLs were never extracted.

Solution

Added a check for src[2][5] between the existing [7] and [0] checks in all URL extraction paths:

_sources.py — list() method: added fallback to src[2][5] when src[2][7] is None
types.py — Source.from_api_response(): added the same fallback in both the medium-nested and deeply-nested parsing paths

The fix validates that src[2][5] is a list with at least one element that is a string before using it as a URL.

Testing

Added unit test in tests/unit/test_types.py: test_from_api_response_youtube_url_at_index_5 — verifies that Source.from_api_response() correctly extracts YouTube URLs stored at entry[2][5]
Added integration test in tests/integration/test_sources.py: test_list_sources_youtube_url_at_index_5 — verifies that sources.list() correctly extracts YouTube URLs from the src[2][5] position
All existing tests continue to pass (2016 passed)

Summary by CodeRabbit

Bug Fixes
- Improved URL extraction for sources to handle multiple metadata layouts and correctly fall back to alternate URL locations (including YouTube), reducing missed or incorrect source links.
Tests
- Added integration and unit tests covering YouTube and other sources with varied metadata positions to ensure reliable URL extraction across formats.

…_response() (fixes teng-lin#265)

coderabbitai · 2026-04-11T13:35:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5739ab41-2546-4013-b48e-c50a689b0e6e

📥 Commits

Reviewing files that changed from the base of the PR and between 3725052 and 56c00c5.

📒 Files selected for processing (3)

src/notebooklm/_sources.py
src/notebooklm/types.py
tests/unit/test_types.py

🚧 Files skipped from review as they are similar to previous changes (3)

tests/unit/test_types.py
src/notebooklm/types.py
src/notebooklm/_sources.py

📝 Walkthrough

Walkthrough

Extended URL extraction logic in SourcesAPI.list() and Source.from_api_response() to include fallback checks for YouTube source URLs stored at alternative indices (src[2][5] and src[2][0]), along with corresponding integration and unit tests validating the new extraction paths.

Changes

Cohort / File(s)	Summary
URL Extraction Logic `src/notebooklm/_sources.py`, `src/notebooklm/types.py`	Added sequential fallback logic to check `src[2][7]` (web/PDF), then `src[2][5]` (YouTube data tuple), then `src[2][0]` (HTTP string) for URL extraction. Conditional guards ensure safe access to list and string indices across both files.
Test Coverage `tests/integration/test_sources.py`, `tests/unit/test_types.py`	Added an integration test for a YouTube source with URL located at `src[2][5]` and extended an existing integration test with a YouTube URL assertion. Added a unit test validating `Source.from_api_response()` extracts YouTube URLs from the new fallback index.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I hopped through tuples, indexes wide,

Found YouTube links where they did hide.
Now fallbacks hunt them, one, two, three—
No URL lost, hooray for me! 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main fix: adding YouTube URL extraction from src[2][5] in the list() and from_api_response() methods, matching the core changeset.
Linked Issues check	✅ Passed	The PR fully addresses issue `#265` by implementing src[2][5] fallback in _sources.py list() and types.py from_api_response(), with proper validation and comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to addressing issue `#265`: YouTube URL extraction updates in source files and corresponding unit and integration tests.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request updates the source parsing logic in _sources.py and types.py to correctly extract YouTube URLs from the API response, which are located at index 5 instead of index 7. It also introduces a fallback for URLs found at index 0 and includes comprehensive unit and integration tests. Review feedback suggests enhancing the robustness of the extraction by adding explicit string type checks for all URL indices and ensuring consistent logic across different parsing paths.

gemini-code-assist · 2026-04-11T13:38:08Z

src/notebooklm/_sources.py

+                    if len(src[2]) > 7:
+                        url_list = src[2][7]
+                        if isinstance(url_list, list) and len(url_list) > 0:
+                            url = url_list[0]
+                    if not url and len(src[2]) > 5:
+                        yt_data = src[2][5]
+                        if isinstance(yt_data, list) and len(yt_data) > 0 and isinstance(yt_data[0], str):
+                            url = yt_data[0]
+                    if not url and len(src[2]) > 0:
+                        if isinstance(src[2][0], str) and src[2][0].startswith("http"):
+                            url = src[2][0]


The extraction logic for src[2][7] should include a type check for the first element (ensuring it is a string), consistent with the validation added for YouTube URLs at src[2][5]. This prevents potential type errors if the API returns non-string data. Additionally, per repository rules, please add a comment explaining this logic as it addresses varying API response formats.

Suggested change

if len(src[2]) > 7:

url_list = src[2][7]

if isinstance(url_list, list) and len(url_list) > 0:

url = url_list[0]

if not url and len(src[2]) > 5:

yt_data = src[2][5]

if isinstance(yt_data, list) and len(yt_data) > 0 and isinstance(yt_data[0], str):

url = yt_data[0]

if not url and len(src[2]) > 0:

if isinstance(src[2][0], str) and src[2][0].startswith("http"):

url = src[2][0]

# Handle varying API response formats for URL extraction.

if len(src[2]) > 7 and isinstance(src[2][7], list) and len(src[2][7]) > 0 and isinstance(src[2][7][0], str):

url = src[2][7][0]

if not url and len(src[2]) > 5 and isinstance(src[2][5], list) and len(src[2][5]) > 0 and isinstance(src[2][5][0], str):

url = src[2][5][0]

if not url and len(src[2]) > 0:

if isinstance(src[2][0], str) and src[2][0].startswith("http"):

url = src[2][0]

References

Add comments to explain complex logic, such as recursive ID extraction, especially when it addresses varying API response formats.

gemini-code-assist · 2026-04-11T13:38:08Z

src/notebooklm/types.py

                        if len(entry[2]) > 7 and isinstance(entry[2][7], list):
                            url = entry[2][7][0] if entry[2][7] else None
+                        if not url and len(entry[2]) > 5:
+                            yt_data = entry[2][5]
+                            if isinstance(yt_data, list) and len(yt_data) > 0 and isinstance(yt_data[0], str):
+                                url = yt_data[0]


The medium-nested parsing path is missing the fallback check for entry[2][0] starting with "http", which is present in the deeply-nested path and in _sources.py. Additionally, the extraction of entry[2][7] should be made more robust by verifying that the first element is a string. Please include a comment explaining this logic to handle varying API response formats.

Suggested change

if len(entry[2]) > 7 and isinstance(entry[2][7], list):

url = entry[2][7][0] if entry[2][7] else None

if not url and len(entry[2]) > 5:

yt_data = entry[2][5]

if isinstance(yt_data, list) and len(yt_data) > 0 and isinstance(yt_data[0], str):

url = yt_data[0]

# Handle varying API response formats for URL extraction.

if len(entry[2]) > 7 and isinstance(entry[2][7], list) and len(entry[2][7]) > 0 and isinstance(entry[2][7][0], str):

url = entry[2][7][0]

if not url and len(entry[2]) > 5 and isinstance(entry[2][5], list) and len(entry[2][5]) > 0 and isinstance(entry[2][5][0], str):

url = entry[2][5][0]

if not url and len(entry[2]) > 0 and isinstance(entry[2][0], str) and entry[2][0].startswith("http"):

url = entry[2][0]

References

Add comments to explain complex logic, such as recursive ID extraction, especially when it addresses varying API response formats.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

tests/unit/test_types.py (1)

164-193: Add a medium-nested index-5 regression case

This test only exercises the deeply nested branch. Consider adding one medium-nested payload case so both Source.from_api_response() URL-extraction paths are locked.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_types.py` around lines 164 - 193, Add a second test that
covers the medium-nested extraction path for YouTube URLs so both branches in
Source.from_api_response() are exercised: create a new test function (e.g.,
test_from_api_response_youtube_url_medium_nesting) that constructs a payload
with the medium nesting variant of the metadata array containing the YouTube
entry, call Source.from_api_response(data), and assert the resulting Source.id,
Source.url (exact YouTube URL), and Source.kind == SourceType.YOUTUBE to lock
the medium-nesting branch.

tests/integration/test_sources.py (1)

220-274: Optional: add explicit sources.get() regression for index-5 YouTube URLs

This new test validates list() well. Since the issue scope includes get(), a direct get() case would guard against future refactors that decouple get() from list().

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/integration/test_sources.py` around lines 220 - 274, Add a
complementary regression test that calls client.sources.get(...) to verify
YouTube URLs stored at src[2][5] are parsed the same as list(); create a new
async test (e.g., test_get_source_youtube_url_at_index_5) that builds the same
RPC GET_NOTEBOOK response used in test_list_sources_youtube_url_at_index_5,
mocks it via httpx_mock, opens NotebookLMClient(auth_tokens), calls await
client.sources.get("nb_123", "src_yt") (or the appropriate single-get
signature), and asserts the returned Source has id "src_yt", kind "youtube", and
url "https://www.youtube.com/watch?v=dcWU-qD8ISQ" to ensure get() handles
index-5 YouTube metadata.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/notebooklm/_sources.py`:
- Around line 108-111: The code assigns url = url_list[0] without verifying its
type; update the block where src is inspected (the branch that checks
len(src[2]) > 7 and sets url_list = src[2][7]) to validate that url_list is a
non-empty list and that its first element is a string (and optionally non-empty
after strip) before assigning to url; if the element is not a string, skip it
and continue the fallback chain (or leave url unset) so malformed payloads
cannot inject non-string values.

---

Nitpick comments:
In `@tests/integration/test_sources.py`:
- Around line 220-274: Add a complementary regression test that calls
client.sources.get(...) to verify YouTube URLs stored at src[2][5] are parsed
the same as list(); create a new async test (e.g.,
test_get_source_youtube_url_at_index_5) that builds the same RPC GET_NOTEBOOK
response used in test_list_sources_youtube_url_at_index_5, mocks it via
httpx_mock, opens NotebookLMClient(auth_tokens), calls await
client.sources.get("nb_123", "src_yt") (or the appropriate single-get
signature), and asserts the returned Source has id "src_yt", kind "youtube", and
url "https://www.youtube.com/watch?v=dcWU-qD8ISQ" to ensure get() handles
index-5 YouTube metadata.

In `@tests/unit/test_types.py`:
- Around line 164-193: Add a second test that covers the medium-nested
extraction path for YouTube URLs so both branches in Source.from_api_response()
are exercised: create a new test function (e.g.,
test_from_api_response_youtube_url_medium_nesting) that constructs a payload
with the medium nesting variant of the metadata array containing the YouTube
entry, call Source.from_api_response(data), and assert the resulting Source.id,
Source.url (exact YouTube URL), and Source.kind == SourceType.YOUTUBE to lock
the medium-nesting branch.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d42de255-271d-4422-89ce-fb4ec0dbfac6

📥 Commits

Reviewing files that changed from the base of the PR and between a997718 and 3725052.

📒 Files selected for processing (4)

src/notebooklm/_sources.py
src/notebooklm/types.py
tests/integration/test_sources.py
tests/unit/test_types.py

coderabbitai · 2026-04-11T13:39:01Z

src/notebooklm/_sources.py

+                    if len(src[2]) > 7:
+                        url_list = src[2][7]
+                        if isinstance(url_list, list) and len(url_list) > 0:
+                            url = url_list[0]


⚠️ Potential issue | 🟡 Minor

Validate index-7 URL element type before assignment

At Line 110, url_list[0] is accepted without a string check. A malformed payload can set a non-string URL and bypass the fallback chain.

Proposed patch

- if len(src[2]) > 7: - url_list = src[2][7] - if isinstance(url_list, list) and len(url_list) > 0: - url = url_list[0] + if len(src[2]) > 7: + url_list = src[2][7] + if ( + isinstance(url_list, list) + and len(url_list) > 0 + and isinstance(url_list[0], str) + ): + url = url_list[0]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/notebooklm/_sources.py` around lines 108 - 111, The code assigns url = url_list[0] without verifying its type; update the block where src is inspected (the branch that checks len(src[2]) > 7 and sets url_list = src[2][7]) to validate that url_list is a non-empty list and that its first element is a string (and optionally non-empty after strip) before assigning to url; if the element is not a string, skip it and continue the fallback chain (or leave url unset) so malformed payloads cannot inject non-string values.

fix: extract YouTube source URL from src[2][5] in list() and from_api…

3725052

…_response() (fixes teng-lin#265)

gemini-code-assist bot reviewed Apr 11, 2026

View reviewed changes

style: apply ruff formatting

56c00c5

coderabbitai bot reviewed Apr 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: extract YouTube source URL from src[2][5] in list() and from_api_response()#267

fix: extract YouTube source URL from src[2][5] in list() and from_api_response()#267
octo-patch wants to merge 2 commits intoteng-lin:mainfrom
octo-patch:fix/issue-265-youtube-url-extraction

octo-patch commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

gemini-code-assist bot Apr 11, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Apr 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Apr 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octo-patch commented Apr 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 11, 2026 •

edited

Loading