You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/app/blog/tantivy-interview/index.mdx
+18-15Lines changed: 18 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,23 +20,26 @@ That developer is Paul Masurel. After creating Tantivy, Paul co-founded Quickwit
20
20
21
21
<Question>You've spent much of your career rethinking how search infrastructure is built. How did you first get interested in search systems?</Question>
22
22
<Answer>
23
-
At the beginning of my career, I joined a French enterprise search startup called Exalead. I was a frontend engineer at the time. I like to think my frustration grew from not being part of the core team. Long-fermented frustration is a great driver.
23
+
My history with search engines started at a small French enterprise search company called Exalead. I was a front-end engineer at the time. I like to think my frustration grew from not being part of the core team. Long-fermented frustration is an underrated driver in my opinion.
24
24
</Answer>
25
25
26
26
## Building Tantivy
27
27
28
-
That frustration simmered for years, through a move to Japan, a career shift into search quality at Indeed, and a growing proficiency in C++. The catalyst came on a long-haul flight in 2016. The first version was "a bit silly" and only took a couple of months of spare time, but it was enough to prove the idea had legs.
28
+
Paul’s frustration soon led to a job as a backend engineer for a search product at Indeed in Japan, where he worked on the core search engine itself. But the idea of building something from scratch kept pulling at him. The catalyst came on a long-haul flight in 2016. The first version was “a bit silly” and only took a couple of months of spare time, but it was enough to prove the idea had legs.
29
29
30
30
<Question>What made you decide to build a search engine from scratch, and did Rust shape its architecture?</Question>
31
31
<Answer>
32
-
I read the Rust book during a flight from Tokyo to Paris in 2016. Having worked a lot with C++, all of the ideas were very enticing to me. I quickly worked through the [exercism.io](https://exercism.io/) Rust track and then wanted to test the language on a real-life project — something with IO, error management, multithreading. What project should I pick?
32
+
I read the Rust book during a flight from Tokyo to Paris in 2016. Being very familiar with C++, all of the ideas were very enticing to me. I quickly worked through the [exercism.io](https://exercism.io/) Rust track and then wanted to test the language on a real-life project — something with IO, error management, multithreading. What project should I have picked?
33
33
34
-
At the time, I was working at Indeed on the search quality team. Our search engine was based on Lucene 2.4. Building a search engine was a way for me to test my understanding of search engines and try out Rust on a real-life project.
34
+
At the time, I was working at Indeed in the search quality team. Our search engine was based on Lucene 2.4. Building a search engine was a perfect way for me to test my understanding of search engines and try out Rust on a real-life project.
35
35
36
36
Rust did impact the way the code is organized, but I wouldn't say it shaped the architecture. Lucene is really the inspiration there.
37
+
37
38
</Answer>
38
39
39
-
Paul didn't set out to build a Lucene replacement. He set out to build something small enough that developers could actually own it. He has described becoming more productive in Rust within two weeks than he had been after five years of C++, and experiencing "a degree of confidence that my code was not buggy, that I had never experienced in any other language." That confidence shows in how he scoped the project: like Lucene, Tantivy is a library, not a server. It handles indexing, compression, and search, but leaves distribution and orchestration to whatever system wraps it.
40
+
Paul didn't set out to build a Lucene replacement. He set out to build something small enough that developers could actually own it. He has described becoming more productive in Rust within two weeks than he had been after five years of C++, and experiencing "a degree of confidence that my code was not buggy, that I had never experienced in any other language".
41
+
42
+
Architecturally, Tantivy follows Lucene’s model: it is a library, not a server. It handles indexing, compression, and search, but leaves distribution and orchestration to whatever system embeds it.
40
43
41
44
<Question>Were there particular trade-offs or priorities you focused on?</Question>
42
45
<Answer>
@@ -51,11 +54,11 @@ Today, Tantivy users come in all sizes and shapes. We still sometimes refuse PRs
51
54
52
55
## The Benchmark Game
53
56
54
-
That "small but modular" philosophy paid off in an unexpected way: performance. Under the hood, Tantivy uses finite state transducers for its term dictionary, SIMD-accelerated compression for its inverted index, and a memory-mapped I/O layer that keeps resident memory remarkably low. The result is a library that can handle indexes larger than available RAM without breaking a sweat. And performance, in the search world, gets noticed.
57
+
It quickly became apparent that Tantivy was not only competitive with other search engines — in many cases it outperformed them. Under the hood, Tantivy uses finite state transducers for its term dictionary, SIMD-accelerated compression for its inverted index, and a memory-mapped I/O layer that keeps resident memory remarkably low. The result is a library that can handle indexes larger than available RAM without breaking a sweat.
55
58
56
-
Early on, Paul published [Search Benchmark, the Game](https://github.com/quickwit-oss/search-benchmark-game), a reproducible benchmark suite showing Tantivy was often 2x faster than Lucene. What happened next is genuinely rare in open source. Adrien Grand, a Lucene committer, responded not with defensiveness but with curiosity, publishing "[Why is Tantivy Faster than Lucene?](https://jpountz.github.io/2025/04/12/why-is-Tantivy-faster-than-Lucene.html)" and a [follow-up analysis](https://jpountz.github.io/2025/05/12/analysis-of-Search-Benchmark-the-Game.html) of the benchmarking suite itself. Since then, Lucene has landed patches that move the needle back in their favor across [most areas](https://tantivy-search.github.io/bench/) and even opened issues inviting Tantivy to [replicate these optimizations](https://github.com/quickwit-oss/tantivy/issues?q=is%3Aissue%20state%3Aopen%20author%3Ajpountz).
59
+
Jason Wolfe, Paul's manager at Indeed, created [Search Benchmark, the Game](https://github.com/quickwit-oss/search-benchmark-game), a reproducible benchmark suite that showed Tantivy was often 2x faster than Lucene. Over the years the benchmark sparked a genuine back-and-forth between the two projects — Adrien Grand, a Lucene committer, published "[Why is Tantivy Faster than Lucene?](https://jpountz.github.io/2025/04/12/why-is-Tantivy-faster-than-Lucene.html)" and a [follow-up analysis](https://jpountz.github.io/2025/05/12/analysis-of-Search-Benchmark-the-Game.html), and Lucene has since landed patches that close the gap across [most areas](https://tantivy-search.github.io/bench/).
57
60
58
-
<Question>Can you see this game of cat and mouse continuing?</Question>
61
+
<Question>Can you see this game of cat and mouse between Tantivy and Lucene continuing?</Question>
59
62
<Answer>
60
63
Let me first talk a little bit about that benchmark. We wanted it honest and reproducible, and designed it to help us find out where Tantivy's performance was lacking.
61
64
@@ -89,7 +92,7 @@ Despite that tension, the ecosystem around Tantivy has grown into something Paul
89
92
90
93
<Question>Tantivy now serves as the foundation for Quickwit, LNX, ParadeDB, Turso, and others. Have you seen contributions or design ideas flow back upstream?</Question>
91
94
<Answer>
92
-
Someone contributed support for [geo search](https://github.com/quickwit-oss/tantivy/pull/2729) seemingly out of nowhere. This is something we had wanted to add for a while. The PR is large but its quality is very impressive. I'm currently on PTO, so I'm hoping to find time to get it merged.
95
+
Someone contributed support for [geo search](https://github.com/quickwit-oss/tantivy/pull/2729) seemingly out of nowhere. This is something we had wanted to add for a while. The PR is large but its quality is very impressive. I hope I will eventually find time to get it merged.
93
96
94
97
ParadeDB has also been actively contributing great PRs and deep ideas lately.
95
98
</Answer>
@@ -107,9 +110,9 @@ A library growing beyond what its creator can fully track is usually the point w
107
110
108
111
<Question>After building Tantivy, you co-founded Quickwit in 2020. What was the original vision, and what gap did you see in the market?</Question>
109
112
<Answer>
110
-
The original vision was actually very different. There was a real-time, large-scale, search-based analytics tool we were using at Indeed that was incredibly powerful and has no equivalent.
113
+
The original vision was actually very different. There was a real-time, large-scale, search-based analytics tool we were using at Indeed that was incredibly powerful. We want to replicate a similar experience.
111
114
112
-
I still think there's room for such a product. As we started building it, we noticed that traditional search engines were in principle perfectly suited to run off S3, so we opportunistically pivoted to building a log search engine on S3.
115
+
As we started building it, we noticed that traditional search engines were perfectly suited to run off S3, so we opportunistically pivoted to building a log search engine on S3.
113
116
</Answer>
114
117
115
118
The pivot is a detail that's easy to gloss over, but it says something important about how Paul works: follow the architecture, not the roadmap. If the underlying technology points somewhere interesting, go there.
@@ -141,9 +144,9 @@ To be honest, this was a very difficult decision. One benefit we expected was th
141
144
142
145
<Question>Was it difficult to secure a commitment to keep Tantivy and Quickwit open source?</Question>
143
146
<Answer>
144
-
We didn't ask — Datadog offered to relicense Quickwit under Apache. This allowed four (and probably more) companies to build their products around Quickwit.
147
+
Datadog offered to relicense Quickwit under Apache. This allowed four (and probably more) companies to build their products around Quickwit.
145
148
146
-
That said, the product we're building at Datadog is not open source. We push all improvements to Quickwit and maintain a private fork with the Datadog-specific code. We can't afford spending much time dealing with support or contributions that aren't aligned with our product's agenda.
149
+
That said, the product we're building at Datadog is not open source. We push all improvements to Quickwit and maintain a private fork with the Datadog-specific code. Apart from that, we cannot afford spending much time dealing with support or contributions that aren't aligned with our product's agenda.
147
150
</Answer>
148
151
149
152
It's a pragmatic arrangement, and a generous one by acquisition standards. The open-source projects stay open, the proprietary product stays proprietary, and the line between them is clean. What's more interesting is how working at Datadog's scale has reshaped Paul's thinking about search itself.
@@ -157,13 +160,13 @@ We've pushed several massive optimizations into Tantivy, and improvements to Qui
157
160
158
161
## Advice for Developers
159
162
160
-
Paul's arc, from frustrated frontend engineer to the creator of search infrastructure used by companies worldwide, is in many ways the story of someone who looked at a "solved" problem and decided it wasn't. We closed by asking what he'd tell developers who want to do the same.
163
+
Paul's arc, from frustrated front-end engineer to the creator of search infrastructure used by companies worldwide, is in many ways the story of someone who looked at a "solved" problem and decided it wasn't. We closed by asking what he'd tell developers who want to do the same.
161
164
162
165
<Question>If I had looked at the lexical search and BM25 space in 2016, I would have said it was solved, and that catching up would be nearly impossible. You proved otherwise. What advice would you give to developers who are eyeing "solved" problem spaces with fresh eyes?</Question>
163
166
<Answer>
164
167
Keep an eye on the academic world. The new ideas often come from there, and they won't make it into the industry without our help — and our sweat.
165
168
166
169
Keep refining your mental models about how systems work. Software is a collection of abstraction matryoshka dolls. Identify these abstractions and study them. It will make you a better developer: you'll start building beautiful abstractions yourself. But you'll also notice that there's a lot of value to be delivered where abstractions leak.
167
170
168
-
And keep a critical view of why the industry converged on a given solution. A lot of textbooks are wrong. Maybe your problem is singular enough to not match the vanilla solution. Or maybe the industry made choices in the past when hardware and software looked very different from what we have today.
171
+
And keep a critical view of why the industry converged on a given solution. Maybe your problem is singular enough to not match the vanilla solution. Or maybe the industry made choices in the past when hardware and software looked very different from what we have today.
0 commit comments