feat: add markdown output format for source fulltext by jaytxng · Pull Request #223 · teng-lin/notebooklm-py

jaytxng · 2026-03-25T22:45:42Z

Adds a format parameter to get_fulltext() that lets users retrieve source
content as structured Markdown instead of flattened plaintext.

What

New format kwarg on client.sources.get_fulltext() — "text" (default) or "markdown"
New -f/--format CLI option on source fulltext
Markdown format preserves headings, tables, links, and emphasis from the source
Uses the API's HTML rendition (params [3],[3]) converted via markdownify

Why

The existing plaintext output strips all structure — headings, tables, links,
emphasis are lost. For downstream use cases (LLM context, note-taking, research
pipelines), preserving that structure matters. resolves #222

Details

Default behavior unchanged — "text" is the default for both CLI and Python API
markdownify is an optional dependency (pip install notebooklm-py[markdown])
Added markdown extra to pyproject.toml
Updated cli-reference, python-api docs, and changelog

Summary by CodeRabbit

New Features
- Fulltext output now supports markdown in addition to plain text via a new format option.
- CLI command adds a -f/--format flag to choose text or markdown output.
- Optional markdown support available via an extra dependency.
Documentation
- CLI reference and Python API docs updated to document the new format option and example usage.

…rkdown format

gemini-code-assist · 2026-03-25T22:46:00Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the notebooklm-py library and CLI by introducing the capability to retrieve source content in a structured Markdown format. Previously, only flattened plaintext was available, which stripped away valuable formatting. The new functionality allows users to access richer content, preserving elements like headings, tables, and links, which is crucial for downstream applications like LLM context generation, note-taking, and research pipelines. This change provides greater flexibility and utility for interacting with source data.

Highlights

Structured Markdown Output: Introduced a new format parameter to client.sources.get_fulltext() in the Python API, allowing users to retrieve source content as structured Markdown instead of flattened plaintext.
CLI Integration: Added a new -f/--format CLI option to the source fulltext command, enabling command-line users to specify the desired output format.
Content Preservation: The new Markdown format preserves structural elements such as headings, tables, links, and emphasis, which are lost in the default plaintext output.
Optional Dependency: Integrated markdownify as an optional dependency, requiring pip install notebooklm-py[markdown] for Markdown functionality.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-03-25T22:46:02Z

📝 Walkthrough

Walkthrough

Adds a format parameter to SourcesAPI.get_fulltext() (defaults to "text", supports "markdown") and a --format/-f CLI option for source fulltext. Markdown output converts HTML to Markdown via an optional markdownify extra (markdownify>=0.14.1).

Changes

Cohort / File(s)	Summary
Documentation `CHANGELOG.md`, `docs/cli-reference.md`, `docs/python-api.md`	Documented new `format` parameter for `get_fulltext()` with values `"text"` or `"markdown"`, added CLI `-f` option and updated examples.
Dependencies `pyproject.toml`	Added optional extra `markdown` that installs `markdownify>=0.14.1`.
API Implementation `src/notebooklm/_sources.py`	Added `SourcesAPI.get_fulltext(..., *, format: str = "text")`. Validates `format`; for `"text"` extracts and joins plaintext blocks, for `"markdown"` requests HTML and converts it to Markdown using `markdownify` (raises ImportError with install hint if missing).
CLI Integration `src/notebooklm/cli/source.py`	`source fulltext` command gains `--format`/`-f` option (`text`

Sequence Diagram(s)

sequenceDiagram
participant CLI as "CLI (user)"
participant Client as "NotebookLM Client\nSourcesAPI.get_fulltext"
participant RPC as "Backend RPC\nGET_SOURCE"
participant Markdownify as "markdownify (optional)"

CLI->>Client: request fulltext(source_id, format)
Client->>RPC: GET_SOURCE(params depending on format)
RPC-->>Client: response (plaintext blocks OR HTML)
alt format == "text"
Client->>Client: extract and join text blocks
else format == "markdown"
Client->>Markdownify: convert HTML -> Markdown
Markdownify-->>Client: markdown content
end
Client-->>CLI: SourceFulltext { content, char_count, ... }

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 I hopped through tags and lines today,
Turned HTML crumbs into Markdown play,
Text or markdown, pick your way,
A tiny extra in the fray,
I nibble bytes and bound away ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: adding markdown output format support for the source fulltext feature.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request introduces the capability to retrieve source fulltext content in Markdown format, in addition to the existing plain text format. This feature is exposed via a new format parameter in the get_fulltext method of the Python API and a corresponding -f/--format option in the CLI. The Markdown conversion leverages the markdownify library, which is now an optional dependency. The review comments suggest improving error handling for the markdownify dependency, clarifying installation instructions in the docstrings, and adding input validation for the format parameter to ensure robust behavior.

src/notebooklm/_sources.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/notebooklm/_sources.py (1)

669-698: ⚠️ Potential issue | 🟡 Minor

Reject unsupported format values instead of silently returning text.

Line 698 currently treats every value other than "markdown" as "text", so typos like "md" quietly produce the wrong output. This is an edge-case correctness gap worth failing fast on.

🩹 Suggested fix

+        if format not in {"text", "markdown"}:
+            raise ValidationError("format must be 'text' or 'markdown'")
         # [3],[3] returns HTML at result[4][1]; [2],[2] returns plaintext at result[3][0]
         params = [[source_id], [3], [3]] if format == "markdown" else [[source_id], [2], [2]]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/_sources.py` around lines 669 - 698, The get_fulltext method
currently treats any format other than "markdown" as "text"; update get_fulltext
to validate the format parameter (accept only "text" and "markdown") and fail
fast by raising a clear exception (e.g., ValueError) when an unsupported value
is passed. Locate the format parameter check in get_fulltext (where params is
set) and add a guard before computing params that verifies format is one of the
allowed strings, raising an error with a message referencing the invalid format.

src/notebooklm/cli/source.py (1)

628-661: ⚠️ Potential issue | 🟠 Major

Disable markup parsing for user-generated content in markdown output.

When using --format markdown, the fulltext.content passed to console.print() on lines 683 and 688 contains markdown syntax with brackets (links, tables, etc.). Since Rich's Console defaults to markup=True, it parses [] as markup tags, corrupting the markdown output. Set markup=False on both calls to display the content faithfully.

Fix

             # Show first 2000 chars with truncation notice
             if len(fulltext.content) > 2000:
-                console.print(fulltext.content[:2000])
+                console.print(fulltext.content[:2000], markup=False)
                 console.print(
                     f"\n[dim]... ({fulltext.char_count - 2000:,} more chars, use -o to save full content)[/dim]"
                 )
             else:
-                console.print(fulltext.content)
+                console.print(fulltext.content, markup=False)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/notebooklm/cli/source.py` around lines 628 - 661, The markdown output is
being corrupted because Rich's Console defaults to markup=True when printing
user-generated markdown; in source_fulltext, when content_format == "markdown"
ensure the two console.print(...) calls that print fulltext.content set
markup=False so Rich does not interpret brackets as markup (locate the prints
that use fulltext.content in the source_fulltext function and add markup=False
to both calls).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/cli-reference.md`:
- Line 100: The CLI docs row for "fulltext <id>" shows the example using `-f
markdown` but doesn't mention the optional dependency; update the table entry or
add a short parenthetical note stating that the `-f markdown` formatter requires
the optional "markdown" extra (from pyproject.toml) to be installed, e.g.,
instruct users to install the package with the `markdown` extra before using `-f
markdown` to avoid ModuleNotFoundError; reference the `fulltext <id>` command
and the `-f markdown` flag so readers can easily find the doc line to edit.

In `@docs/python-api.md`:
- Line 236: Update the docs row for get_fulltext to indicate the third parameter
is keyword-only: change the parameters column that currently reads "str, str,
str" to reflect "str, str, *, format: str" (or similar keyword-only notation) so
it matches the function signature and implementation of
get_fulltext(notebook_id, source_id, *, format="text") and prevents callers from
passing format positionally.

In `@pyproject.toml`:
- Line 37: The extras configuration is missing the new "markdown" extra from the
aggregate "all" extra; update the [project.optional-dependencies] section so
that the "all" extra includes "markdown" (which provides the markdownify>=0.14.1
requirement) along with the existing "browser" and "dev" entries, ensuring pip
install notebooklm-py[all] pulls in markdownify and the new markdown fulltext
path works at runtime.

---

Outside diff comments:
In `@src/notebooklm/_sources.py`:
- Around line 669-698: The get_fulltext method currently treats any format other
than "markdown" as "text"; update get_fulltext to validate the format parameter
(accept only "text" and "markdown") and fail fast by raising a clear exception
(e.g., ValueError) when an unsupported value is passed. Locate the format
parameter check in get_fulltext (where params is set) and add a guard before
computing params that verifies format is one of the allowed strings, raising an
error with a message referencing the invalid format.

In `@src/notebooklm/cli/source.py`:
- Around line 628-661: The markdown output is being corrupted because Rich's
Console defaults to markup=True when printing user-generated markdown; in
source_fulltext, when content_format == "markdown" ensure the two
console.print(...) calls that print fulltext.content set markup=False so Rich
does not interpret brackets as markup (locate the prints that use
fulltext.content in the source_fulltext function and add markup=False to both
calls).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f75e0166-2504-4fbe-8582-7791787dcc93

📥 Commits

Reviewing files that changed from the base of the PR and between ddb4dee and 6753bc2.

📒 Files selected for processing (6)

CHANGELOG.md
docs/cli-reference.md
docs/python-api.md
pyproject.toml
src/notebooklm/_sources.py
src/notebooklm/cli/source.py

docs/cli-reference.md

docs/python-api.md

pyproject.toml

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/notebooklm/_sources.py`:
- Around line 669-671: Rename the get_fulltext parameter named format to avoid
shadowing the built-in (e.g., output_format) and update its type to a Literal
union ("text" | "markdown") for stricter typing; inside get_fulltext add an
early validation that raises ValueError if output_format is not "text" or
"markdown" so invalid values (e.g., "json") don't silently return empty content.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e5b722a6-7117-43e2-8963-fda63d2c9f18

📥 Commits

Reviewing files that changed from the base of the PR and between 6753bc2 and 2f2c5c8.

📒 Files selected for processing (1)

src/notebooklm/_sources.py

src/notebooklm/_sources.py

teng-lin · 2026-03-29T17:27:57Z

Thanks for this PR, @jaytxng! The markdown format support is a great addition — clean API design, proper optional dependency handling, and thorough documentation updates. Nice work.

I ran a multi-model review (Claude, Gemini, Codex) and here are the consolidated findings:

Recommendations

1. Add tests (all 3 reviewers flagged this)

The new RPC/parse branch has no coverage. At minimum:

Verify [[source_id], [3], [3]] params are sent for output_format="markdown"
Verify HTML at result[4][1] is parsed and converted
Verify ImportError with helpful message when markdownify is missing
Verify ValueError on invalid output_format
Verify CLI -f markdown passes through correctly

2. Rich markup eats `[brackets]` in terminal output

console.print(fulltext.content) passes raw markdown through Rich, which interprets [text] as markup tags — brackets silently disappear. Consider using console.print(Text(fulltext.content)) or rich.markup.escape() for the content display path. File output (-o) is unaffected.

The same issue affects the ImportError message — pip install 'notebooklm-py[markdown]' will render as pip install 'notebooklm-py' when Rich prints it, losing the crucial [markdown] extra name.

3. Fail fast on missing `markdownify`

Currently the markdownify import check happens after the (potentially slow) RPC call. Moving it before the network request gives users immediate feedback:

if output_format == "markdown":
    try:
        from markdownify import markdownify as md
    except ImportError:
        raise ImportError(...) from None

# then proceed with RPC call...

4. CHANGELOG.md parameter name mismatch

Line 12 says format parameter but the actual kwarg is output_format. Copy-pasting the changelog example would produce a TypeError:

-- New `format` parameter on `client.sources.get_fulltext()`
++ New `output_format` parameter on `client.sources.get_fulltext()`

5. (Minor) More specific warning when markdown HTML is missing

When result[4][1] isn't present in markdown mode, a targeted log message like "Markdown HTML not found at result[4][1] — API response structure may have changed" would help debug future breakage (this is an undocumented RPC after all).

None of these are blocking on their own, but #1 (tests) and #2 (Rich markup) are the most impactful to address before merge. Everything else about the PR looks solid — backward-compatible default, proper Literal typing, clean CLI integration. 👍

jaytxng added 2 commits March 25, 2026 15:30

feat: add markdown output format for source fulltext

eac484d

docs: update cli-reference, python-api, and changelog for fulltext ma…

6753bc2

…rkdown format

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

src/notebooklm/_sources.py Outdated Show resolved Hide resolved

src/notebooklm/_sources.py Outdated Show resolved Hide resolved

src/notebooklm/_sources.py Outdated Show resolved Hide resolved

Update src/notebooklm/_sources.py

a2c9ac8

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

docs/cli-reference.md Outdated Show resolved Hide resolved

docs/python-api.md Outdated Show resolved Hide resolved

pyproject.toml Show resolved Hide resolved

Update src/notebooklm/_sources.py

2f2c5c8

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

src/notebooklm/_sources.py Show resolved Hide resolved

jaytxng added 5 commits March 25, 2026 16:07

fix: validate format parameter in get_fulltext

2b2068f

fix: include markdown extra in all dependencies

063c0f9

docs: fix get_fulltext params to show keyword-only format arg

215bd0c

docs: note markdown extra requirement for fulltext -f markdown

6f10b26

refactor: rename format to output_format, add Literal type hint

271f27f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add markdown output format for source fulltext#223

feat: add markdown output format for source fulltext#223
jaytxng wants to merge 9 commits intoteng-lin:mainfrom
jaytxng:feature/fulltext-markdown-format

jaytxng commented Mar 25, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Uh oh!

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

teng-lin commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jaytxng commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Details

Summary by CodeRabbit

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

teng-lin commented Mar 29, 2026

Recommendations

1. Add tests (all 3 reviewers flagged this)

2. Rich markup eats [brackets] in terminal output

3. Fail fast on missing markdownify

4. CHANGELOG.md parameter name mismatch

5. (Minor) More specific warning when markdown HTML is missing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jaytxng commented Mar 25, 2026 •

edited

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

2. Rich markup eats `[brackets]` in terminal output

3. Fail fast on missing `markdownify`