The Zettelkasten Method is a holistic method on how to deal with knowledge. It is defined as a personal tool for thinking and writing that possesses hypertextual features to enable the creation of a web of thoughts.
The method emphasizes connection, not collection. It is considered highly effective and acts as an amplifier of endeavors in knowledge work. With consistent effort and practice, it can produce "gems of knowledge". The method helps streamline workflow, decrease friction, and makes writing easier and more coherent by holding thoughts alive over long periods.
The method is based on the work of the highly productive social scientist, Niklas Luhmann, who published 50 books and over 600 articles. Luhmann himself attributed his output to working in a partnership with his Zettelkasten.
The Zettelkasten Method is founded on three primary traits:
The Zettelkasten is a type of hypertext where notes refer to, explain, expand upon, and use each other's information. This structure is organic and non-linear.
- Connectivity: The emphasis on forming relationships between pieces of knowledge makes new insights possible, as insights result from making unexpected connections.
- Link Context: When connecting notes, it is essential to explicitly state why the connection was made, which is called the link context. This ensures the meaning of the link is captured and creates new knowledge.
This principle dictates that knowledge is composed of discrete building blocks. The guiding compass for note-taking is to capture one knowledge building block, or precisely one thought, per note.
- Contrast with Other Systems: This contrasts with systems like books or Wikipedia articles, where references (chapters, pages) serve only as coordinates and do not directly correspond to a single, referenceable thought.
The Zettelkasten is intended to be a personal thinking tool;
therefore, the guideline is one Zettelkasten per person. Writing in
the
An individual note, or
-
A Unique Identifier (ID): This is the unambiguous address of the note, mandatory for creating the hypertext.
-
Luhmann-ID: For paper systems, Luhmann used a clever
branching numbering system (e.g.,
$1$ ,$2$ ,$1a$ ,$1b$ ,$1a1$ ) to allow organic growth by interspersing or continuing trains of thought. -
Digital IDs: Digital systems often use a time-based ID
(e.g.,
$202006110955$ ) or an arbitrary unique string.
-
Luhmann-ID: For paper systems, Luhmann used a clever
branching numbering system (e.g.,
-
The Body of the
$Zettel$ : This contains the piece of knowledge to be captured. The most important aspect is that the content must be written in the note taker's own words to increase understanding and recall. -
References: Located at the bottom of the
$Zettel$ , this section references the sources of the knowledge.- References to external sources (like books or articles) typically use citekeys from reference management software.
- If the note is based on material already processed, it references other $Zettel$s by linking to their ID.
- If no reference is provided, the content is considered the note taker's own thought by default.
While the method emphasizes bottom-up creation through individual $Zettel$s and connections, hierarchical organization is also useful via Structure Notes.
- Structure Notes (Meta-Notes): These are $Zettel$s that list many other $Zettel$s and their relationships on a specific topic, serving as "hub notes" or tables of contents.
- Entry Points: Luhmann utilized a register that listed keywords with only one or a few note IDs, serving solely as entry points to the most central clusters of notes in his paper-based hypertext.
- Semilattice Structure: The overlaps in how different Structure Notes reference the same $Zettel$s create a semilattice structure that captures complex relationships better than a simple tree structure.
The process is sometimes referred to as creating a "second brain" because the core methods involved, particularly the Zettelkasten Method and modern AI-driven tools, function as an external amplifier for human thinking, knowledge storage, and connectivity.
The Zettelkasten Method is explicitly defined as a personal tool for thinking and writing. It is designed to act as an amplifier of endeavors in knowledge work.
The term "second brain" relates to how the Zettelkasten handles functions typically associated with the human mind:
- Holding Thoughts Alive (Persistence): One of the main problems in writing and thinking is the human brain's limited capacity to follow a single line of thought for an extended period, such as weeks or months. The Zettelkasten solves this by acting as an external memory system that will hold your thoughts alive and help you hold onto them.
- Facilitating Complex Reasoning: The method allows individuals working on complex problems to concentrate on a small part of the problem and then step back to view it with a panorama vision, suggesting it manages the complexity that the limited human mind struggles to juggle.
- Generating Insights via Connectivity: Insights often arise from making new (and unexpected) connections. The Zettelkasten focuses on connection, not collection, creating a web of thoughts via its hypertextual structure. This structure allows the user's mind to outsource the maintenance of relationships between pieces of knowledge, which significantly improves recall and trains the mind to see patterns.
- Personalized Knowledge: The rule that there should be one Zettelkasten per person reinforces the concept of a personal, external organ for thought, as the content is written for oneself and not for the public, avoiding the distortion that comes from filtering one's thoughts for others.
The concept extends to modern tools like NotebookLM, which similarly manage and synthesize information to aid deeper understanding:
- Research Assistant Functionality: NotebookLM is designed to be a virtual note-taking and research assistant. Steven Johnson, who worked on the NotebookLM team, explained that it is a tool for understanding things, taking information, digesting it, and analyzing it so the user can glean more from it.
- Centralizing Scattered Information: The tool addresses the problem where information needed for a project is scattered across desktop folders, tabs, and other places. By consolidating up to 50 sources (PDFs, Google Docs, websites, videos, etc.) and up to 25 million words into individual notebooks, the platform takes on the burden of managing, connecting, and synthesizing information.
- Personalized AI: By keeping related sources in a single notebook (such as an "everything notebook" or a project-based notebook), the user effectively gains a personalized AI that is almost like having another member of the team. This external resource can then be instructed to synthesize information, draft outlines, or connect dots between materials.
Obsidian is fundamentally a note-making application (or "note app") that users download and install on their computer. It allows users to create notes within a designated folder called a vault.
Key characteristics and functions of Obsidian include:
-
Focus on Plain Text: Obsidian uses Markdown (
$\mathrm{md}$ ) files for its notes. Markdown is a plain text format, which is highly versatile and durable, meaning the notes are "future-proof" and readable by any computer. -
Connecting Ideas (Hypertext): Obsidian's core purpose is to
help users move from consuming content to connecting
ideas. This is done primarily through links, which are
created by enclosing a note's title in double brackets (e.g.,
[[Note Title]]). These links turn the collection of notes into an idea verse—a well-connected "internet for the inner world". - The Second Brain/Thinking Tool: By linking thoughts together, Obsidian facilitates linking your thinking, which helps grow and cultivate thoughts over time. It functions as a personal knowledge management library, designed for writing and thinking.
- Viewing Relationships: The application features a graph view where users can visualize all the connections they have made between their notes.
- Note Integrity: Obsidian has a crucial setting that automatically updates internal links if a note's name is changed, ensuring that the relationships between notes remain intact.
-
Structure and Tags: Users can structure their notes using
headers (Markdown
#symbols) and apply tags (using#symbols) to create weak relationship builders for categorization. - Data Source Integration: The Markdown files created in Obsidian can be used as a data source for LLM applications that utilize Retrieval-Augmented Generation (RAG).
The Zettelkasten Method, implemented within the Markdown-based environment of Obsidian, provides a powerful structure for tracking the iterative progress, technical methodologies, and core findings of a data science project. By emphasizing atomicity and connection, this process transforms raw notes into a dynamic "idea verse".
Here is a process for getting started with Obsidian to track the elements of a data science project:
The initial setup focuses on ensuring that the knowledge base is durable, flexible, and ready for growth.
- Create an Obsidian Vault: Start by creating a new vault, which is simply a folder that Obsidian monitors.
- Adopt Plain Text Philosophy: Ensure all documentation is
written in Markdown (
.md) files. This plain text approach is considered the most versatile and durable file format, making the information "future-proof",, and easily manageable by other literate programming tools like Git. - Prioritize Linking over Folders: When first starting, focus on connecting ideas rather than rigidly organizing notes into hierarchical folders,. Complex organization tends to make the system fragile.
The core principle applied here is the Principle of Atomicity,
where each note (or
| Component Type | Action | Example Content |
|---|---|---|
| Tools & Libraries | Create a note for every tool, library, or platform used (e.g., Ollama, LangChain, PubMed Meta Analyzer). Capture its purpose in your own words. |
[[PubMed Meta Analyzer]],: A Python tool designed to automate literature reviews by extracting metadata from PubMed via the Entrez API,. |
| Methods & Concepts | Create atomic notes for technical concepts or modeling approaches, such as Retrieval-Augmented Generation (RAG), Transfer Learning, or Vector Similarity. |
[[Hashing-Based Similarity Search]]: A technique used to enhance search capabilities in evidence retrieval, like topology-preserving hashing. |
| Findings & Claims | Create specific notes capturing key project results, crucial data insights, or notable limitations (e.g., observations of Hallucination Rate or Reference Accuracy). |
[[LLM Context Length Limitations]]: RAG models struggle with context length in extended queries and difficulties in maintaining context for precise vector similarity searches,. |
| Source Tracking | Always include References at the bottom of the |
References: [#Brown_et_al_2020] |
The goal of the Zettelkasten is to build an organic web of thoughts that improves recall and fosters new insights,,.
-
Use Internal Hyperlinks for Strong Connections:
- Whenever one idea informs another, create a link using the
double bracket syntax:
[[Note ID or Title]],. - Capture the Link Context: When creating a link, explicitly state why the connection was made, as this is how new knowledge is created,. Use the note-making prompt: "This reminds me of...",.
- Example: In a note on
Data Annotation, use the prompt "This reminds me of..." and link it to[[Active Learning]]because active learning improves biomedical abstract screening efficiency.
- Whenever one idea informs another, create a link using the
double bracket syntax:
-
Use Tags for Categorization and Metadata:
- Use hashtags (
#) to create non-hierarchical groups that describe the state, component, or category of the information,. - Examples:
#data_cleaning,#model_training,#evaluation_metrics,#python_code. You can easily check all notes containing a tag using the enabled Tag Pane feature in Obsidian.
- Use hashtags (
-
Manage Progress and Iterations with Git:
- Although Obsidian is not Git, because the notes are Markdown files, store the entire vault in a Git repository. This ensures persistence and provides full version history for every iteration and insight captured.
To manage complexity and facilitate project writing, organize related atomic notes into "hub notes".
-
Create Project Structure Notes: Make new notes that act as
tables of content for major project phases or domains. Use a
Markdown list structure to link related $Zettel$s:
-
Example Structure Note:
[[Evaluation Metrics for LLMs]]- Metrics for quality:
[[Overall Quality Score (OQS)]] - Metrics for retrieval:
[[BM25 and TF-IDF]] - Metrics for accuracy:
[[Hallucination Rate]]
- Metrics for quality:
-
Example Structure Note:
-
Use Structure Notes for Reasoning Chains: For tracking
methodology or logical arguments (like systematic review steps),
create a sequential Structure Note to capture the argument flow
(e.g.,
$a \rightarrow b \rightarrow c$ ) and link to the$Zettel$ that supports each step,. This provides panorama vision over complex problems.
Ensure the platform supports the fluid workflow of a data science project:
- Ensure Link Integrity: In Obsidian's settings, enable the most
important setting: "Automatically update internal links". This
ensures that if you rename a note (like changing
Note StartoNote Star 2), all links referring to it automatically update, maintaining the functionality of the hypertext. - Integrate with External Tools (Implicitly): Use the plain text notes as context for advanced LLM analysis. Obsidian notes can be used as a source for tools like LlamaIndex when building Agentic Workflows using Retrieval-Augmented Generation (RAG). These agents can then synthesize drafts or outlines using the connected thoughts recorded in the Zettelkasten,–.
- Focus on Creation: When working, remember the final goal is not organizational perfection but continuous thinking, writing, and connecting ideas.,
The Zettelkasten Method, applied alongside literate programming tools like Markdown and Git, creates a synergistic research workflow that enhances knowledge creation, ensures data portability and persistence, and allows for the seamless integration of modern AI-driven analysis.
Here is how the Zettelkasten Method can be applied with these tools to improve research products, drawing upon the principles of connection, atomicity, and persistent, plain-text formats:
The core philosophy of the Zettelkasten is based on the Principle of Atomicity, which guides the note taker to capture precisely one thought or one knowledge building block per note. Markdown is the ideal format for capturing these atomic notes and connecting them effectively within a literate programming environment:
-
Plain Text and Future-Proofing: Writing Zettel notes as simple
Markdown (
.md) files adheres to the plain text approach and the concept of future-proofing. Plain text is considered the most versatile and durable file format. As long as computers exist, they will be able to read plain text. -
Unique Identifiers (IDs) and Titles: Each individual note, or
$Zettel$ (German for "paper slip"), must have a unique identifier (ID), which acts as its unambiguous address. The content itself is contained in the body of the$Zettel$ , and should be written in the researcher's own words to increase understanding and recall.- Markdown supports using the note's unique title as a header
(using the
#symbol), while digital Zettelkasten software often manages time-based IDs (e.g.,$202006110955$ ).
- Markdown supports using the note's unique title as a header
(using the
-
Hypertext Linking (The Key to Connections): The Zettelkasten
aims to emphasize connection, not collection, creating a web
of thoughts. In Markdown-based tools like Obsidian, links are
created using double brackets (
[[link]]). This link refers to the unique ID or title of another$Zettel$ .- Researchers must state explicitly why a connection was made, providing the link context. For instance, a useful note-making prompt is "This reminds me of...".
-
References and Citations: Notes need References located at
the bottom of the
$Zettel$ . Using extended MultiMarkdown syntax allows the use of citekeys (like[#lastnameYEAR]) generated by reference management software (such as BibDesk or JabRef). This facilitates the rigorous referencing needed for research products.
Git, a version control system common in literate programming, is invaluable for managing the iterative nature of research and the organic growth of the Zettelkasten structure:
- Non-Linear Structure and Organic Growth: The Zettelkasten features an organic and non-linear structure. By storing the entire collection of Markdown notes (the "vault") in a Git repository, researchers can track every change, connection, or modification made to individual $Zettel$s over time.
- Persistence and Auditing: Git ensures persistence of the knowledge base by tracking the history of all files. This acts as a reliable long-term memory for the researcher's thoughts and external research connections. If an insight is lost or misremembered, the history can be audited.
- Modularity and Collaboration: Although the Zettelkasten is primarily intended to be a personal thinking tool, Git allows for simple sharing and collaboration, particularly in multi-agent application architectures where persistence and complex control flow are managed.
The combined methodology directly supports the systematic processes required for high-quality research products, such as systematic reviews:
- Structure Notes (Tables of Content): Researchers can create Structure Notes (or Meta-Notes) using Markdown lists and links. These notes list other $Zettel$s and their relationships on a specific topic, acting as hub notes or tables of contents. This hierarchical structure is highly useful for managing complex problems and organizing an argument sequence, like the stages of a manuscript.
-
Integration with AI/LLMs via Retrieval-Augmented Generation
(RAG): The structured, plain-text format of a Zettelkasten makes
it easily ingestible as a knowledge base for AI tools and
frameworks. The Zettelkasten becomes a local knowledge base
that LLMs can access.
- RAG is a key method for allowing LLMs to retrieve and incorporate new, domain-specific, or updated information. Tools like LlamaIndex are designed specifically to use local knowledge bases (like an Obsidian vault, which stores Markdown notes) as data sources for RAG and Agentic Workflows.
- The LLM agent can be tasked to use its search_documents capability (RAG) to query the notes, effectively synthesizing ideas, structuring introductions, or drafting outlines based on the connected thoughts in the Zettelkasten.
- Fact-Checking and Verification: Since LLM tools utilizing RAG can provide citations that link back to the most relevant original passages in the user's sources, integrating the Zettelkasten (which records external references via citekeys) allows the researcher to leverage AI synthesis while maintaining verifiability of the synthesized research products.