notion-to-md v4 update #112
Replies: 12 comments 11 replies
-
This is awesome, the overall architecture is great. I remember that when I was using the library (about one year ago) the only bottle neck has retrieval of child blocks, for instance if a page had a multi-layered list the items from the first layer would be fetched all at once but all the child blocks were fetched sequentially after that which off course harmed performance. So I'd say that the |
Beta Was this translation helpful? Give feedback.
-
notion-to-md Caching SystemArchitecture Usage Pattern const n2m = new NotionToMarkdown({
notionClient,
config: {
cache: {
enabled: true,
directory: ".notion-cache",
ttl: 3600000, // 1 hour in milliseconds
}
}
});
// Cache is automatically handled
const markdown = await n2m.pageToMarkdown("page-id");
// Manual cache control when needed
await n2m.clearCache();
await n2m.clearPageCache("page-id"); Cache StructureThe cache maintains a simple, flat structure focusing on essential data:
Page-Level vs Block-Level CachingWe chose page-level caching for several compelling reasons:
The Notion API returns all blocks in a page with a single
Pages in Notion are atomic units - when someone edits a page, they might:
Page-level caching ensures we always have a consistent snapshot of the entire page structure. Open to suggestions :) @lucas-almeida026 |
Beta Was this translation helpful? Give feedback.
-
notion-to-md v4: Flexibility to Static Site Generation and Media ManagementIntroductionWhen users integrate Notion content into their static site workflows, they face significant challenges in managing media assets and adapting to different static site generators. Version 4 of notion-to-md aims to provide robust solutions through a flexible, intuitive builder pattern configuration system. Core ChallengesUsers face two primary challenges when working with Notion content:
Proposed SolutionCore ArchitectureThe focus of this proposal is the MediaHandler stage - the media management layer that sits between content fetching and rendering. It implements various media handling strategies based on user configuration and maintains media state tracking through manifests. To understand the context: Notion either stores files on their servers with temporary URLs or allows embedding of externally hosted media. This creates two common scenarios:
Media Handling Strategiesnotion-to-md v4 offers three distinct strategies for handling media through an intuitive builder pattern: 1. Download StrategyPerfect for static site generators and Git-based workflows, this strategy downloads media files locally: const n2m = new NotionToMarkdownBuilder(notionClient)
.setContentPath('./content/posts')
.downloadMediaTo({
outputPath: './content/images',
preserveExternalUrls: false,
transformPath: (path) => `/images/${path.basename}`
})
.build();
// This will:
// 1. Download media to ./content/images
// 2. Update markdown with relative paths
// 3. Track media in manifest
await n2m.pageToMarkdown('notion-page-id', 'my-post.md'); 2. Upload StrategyIdeal for users who prefer serving media from CDNs or cloud storage: const n2m = new NotionToMarkdownBuilder(notionClient)
.setContentPath('./content/posts')
.uploadMediaUsing({
uploadHandler: async (media) => {
const url = await uploadToS3({
bucket: 'my-blog-assets',
key: `images/${media.filename}`,
body: media.data,
contentType: media.mimeType
});
return url;
},
cleanupHandler: async (mediaInfo) => {
await deleteFromS3({
bucket: 'my-blog-assets',
key: `images/${mediaInfo.filename}`
});
},
preserveExternalUrls: true,
transformPath: (url) => url.replace(
'my-blog-assets.s3.amazonaws.com',
'cdn.myblog.com'
)
})
.build(); 3. Direct StrategyUses Notion's URLs directly - perfect for previews and temporary content: const n2m = new NotionToMarkdownBuilder(notionClient)
.setContentPath('./content/posts')
.useDefaultNotionUrls()
.build(); Understanding Path TransformationThe path transformation system helps adapt file paths to your static site's requirements. Here's a practical example: const n2m = new NotionToMarkdownBuilder(notionClient)
.setContentPath('./content/posts')
.downloadMediaTo({
outputPath: './static/images',
transformPath: (path) => {
// path.fullPath: ./static/images/sunset.jpg
// path.basename: sunset.jpg
// path.directory: ./static/images
return `/images/${path.basename}`;
}
})
.build();
// Transforms media paths for static site compatibility:
// Physical path: ./static/images/sunset.jpg
// Markdown reference:  Smart Media ManagementAs highlighted by @chrissy0, managing media changes (deletions, replacements, moves) can lead to orphaned files and re-downloading. One solution can be maintaining json file for each page where block level mapping is maintained in each file. {
"block_456": {
"lastEdited": "2024-01-02T15:00:00Z",
"mediaPath": "/static/images/page-123/image2.jpg",
"mediaType": "notion",
"strategy": "download"
}
} The cleanup process follows these steps:
FeedbackWe value your input on this proposal:
Let us know in the comments below! |
Beta Was this translation helpful? Give feedback.
-
Looking ahead, although the package is primarily built for MD (Markdown) however As we are moving forward towards the final proposal we are left with last few things to figure out:
I'll be sharing the update plugin architecture to support these types and we are open to the actual usecases as well so feel free to share. |
Beta Was this translation helpful? Give feedback.
-
Hi, thanks for your proposal @souvikinator :) The logical decomposition looks great, reminds me very much the winston plugin system (transport,formatters).
This will enable different cache/media strategies in a unified configuration that looks easier to use.
Hope it help in you v4 definition! N.B I do use this library for:
So custom renderer is really needed in my usecase, and cache is not at all a needed feature (as it is done through manual pipelines). Thanks for your awesome work! |
Beta Was this translation helpful? Give feedback.
-
Page Reference Handling in notion-to-md v4 (Internal links)When building content systems with Notion as a CMS, one of the most powerful features is the ability to link between pages. However, this presents an interesting challenge for notion-to-md: how do we handle these references when converting Notion content to different formats? The Challenge: Linking Pages in NotionContent creators have two ways to handle links in Notion: Direct Website Links (Ideal)Content creators can link directly to their website URLs: Notion Page Links (Common)More naturally, people link to other Notion pages while writing. This creates a challenge because these Notion URLs need to point to your website pages instead. For example, transforming Understanding the ComplexityInitially, this might seem like a simple URL transformation problem. However, several factors make it more complex:
Why URL Generation Isn't the Answernotion-to-md converts one page at a time. When we encounter a Notion page link, we can't automatically determine its final website URL because:
Our Proposed SolutionAfter careful consideration, we're implementing a manifest-based reference handling system. The manifest keeps mapping of the page id and the corresponding URL it'll be available on and as the library keeps on processing the pages it'll keep updating the manifest. Here's how it works: const n2m = new NotionToMarkdownBuilder(notionClient)
.handlePageReferences({
urlProperty: 'siteUrl', // Notion property containing the final URL
manifest: './page-manifest.json' // Optional: custom manifest location
})
.build(); The system has two key components:
{
"pages": {
"notion-page-id-1": {
"notionUrl": "https://notion.so/mypage-123",
"siteUrl": "/docs/getting-started"
}
}
}
It'll fall back to the default Notion URL of the page or one can directly use the hosted page URL from their site
Setting up URL mappings manually for all these pages in a file would be tedious and error-prone. For this one can use the utility provided by the library that'll create a mapping for all the pages but there are prerequisites to this:
// initialize notion-to-md
await n2m.generatePageManifest({
rootPageId: 'your-root-page', // under this page all your pages should exist and the integration should have access to this
urlProperty: 'siteUrl' // property in notion page, it's value will be used in mapping
}); This helps in:
Why This Approach?
Would love to hear your thoughts on this approach:
Your feedback will help us refine this feature for notion-to-md v4 |
Beta Was this translation helpful? Give feedback.
-
Hello, Thanks for sharing all this documentation, truly impressive architecture that looks solid to me. I use Astro to generate my static website (mainly a blog). This architecture would allow me to get rid of Last but not least the image management is super interesting, I'd love to be able to store them in my public folder and use Astro Image optimizer to have responsive image sizes (for now i just convert them to inline data which is not ideal but working). Thanks again for all your work there! |
Beta Was this translation helpful? Give feedback.
-
Hey guys! just an update on the V4 development. I haven't been able to dedicate much time due to my job but have managed to complete most of the modules and have tested them end to end. They are working as expected and I'm excited to finish up v4 and make the first alpha release 🤩 Here are the list of modules that are implemented and functional:
Thanks for all the feedback and support any sort of contribution is welcomed. |
Beta Was this translation helpful? Give feedback.
-
Update time, folks! Here’s the list of implemented and fully functional modules:
With all the modules complete, the next step is to use this system to create a default MD/MDX renderer plugin that can be registered during use. A partially working version is available, but I’m a bit behind on basic documentation so that everyone can get started and also few bugs to fix. Another reason for this step is that working on a renderer plugin myself is a great way to uncover friction points or potential issues. This weekend, I plan to wrap up some basic docs (btw, we have Hugo set up under Once that’s done, we should be good for a v4 alpha release! Thanks a ton for sticking around it means a lot. I’ll be more frequent with these updates. 🚀 |
Beta Was this translation helpful? Give feedback.
-
Oof! Finally, the work is done. |
Beta Was this translation helpful? Give feedback.
-
Hey folks, Bad news, I’m back in the job-hunting game 😭 (yep, lost my job, long story). Good news? That means I finally had time to wrap up the docs, and the v4 alpha release is live! 🥳 The docs are decent (shoutout to Hugo <3), so getting started should be a breeze. Now, we all need to go wild with it. Break things, push it to the limits, and dig up every bug possible so we can ship a rock-solid stable release soon. Also, if you know of any job opportunities, I’d really appreciate a lead. Let’s chat! 😊 (I could use some help!) |
Beta Was this translation helpful? Give feedback.
-
I'll be writing a few blog posts to generate some attention for the alpha release—oh, and of course, posting on Reddit too! I've also added a catalog to the site, showcasing community-driven plugins for all sorts of use cases. To start, we'll need to build a few ourselves. The challenge now is figuring out how to attract more eyes to it and foster an engaged community where people come and share their use cases/plugins/case studies. Any ideas or suggestions? I'll also be sharing more of my thoughts and the strategies I'm experimenting with. |
Beta Was this translation helpful? Give feedback.
-
notion-to-md v4: Technical Proposal
Focus: performance, extensibility, bug fixes
While the package is primarily built for MD (Markdown), the Goal is to broaden its scope to include other relevant content types like MDX, JSX, HTML, XML and other formats through a plugin-based renderer system.
Architecture Overview
notion-to-md v4 introduces a modular, plugin-based architecture that separates concerns into three main components that work together in a pipeline. Each module is designed to be independent, making the system both maintainable and extensible.
Data Flow
Module Breakdown
Each module has its own responsibilities and configuration options:
Block Fetcher [Discussion] Notion To MD v4: Block Fetcher Technical Details #114
Media Handler [Discussion] Notion To MD v4: Media Handler Technical Details #115
Page Reference Handler (Internal Link Handler)
Renderer Plugin System [Discussion] Notion To MD v4: Renderer Plugin System Technical Details #116
Usage Examples
Basic Usage with Builder Pattern
Media Handling Examples
Direct Strategy (Using Notion URLs)
Download Strategy (Local Storage)
Upload Strategy (Cloud Storage)
Custom Renderer Examples
JSX Renderer Example
LaTeX Renderer Example
Next Steps
Share your thoughts in the comments!
For updates, watch this discussion or the main repository. We'll share alpha builds soon for early testing.
Beta Was this translation helpful? Give feedback.
All reactions