You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use CouchDb as our backend storage since 2014. Back in 2018, some of our clients started to complain about problems related to delays for various db related operations. The problem was that we create/modify a number of design documents at runtime, which in turn causes re-indexing of the documents. When db has at least hundreds of thousands of documents, this might take some time, and unfortunately customers are not patient enough :)
We started to think, how to solve this, especially for our specific cases, when such design doc changes affected only a small subset of docs, where _id has specific prefix, e.g. "List-" or "App-". What we thought was: why parse the whole document, and later discard it in map function, when such discard could be done earlier? We don't really know Erlang, but with quick jump into tutorials, we were able to accomplish the task and build custom CouchDb, with this feature enabled. Similar changes were introductd to Dreyfus, to increase performance of full text indexing.
To accomplish the above, a new field _idfilter (with regular expression value) was introduced in design document. If this field is present and not null, then in LoadDoc function (couch_index_updater.erl), regex test is executed on document _id. Only if match is found, document is further processed.
This works in production since then (almost 3 years), significantly increasing indexing speed. We quite forgot about this change, but soon we're going to make 2.x -> 3.x transition, and the subject was brought back, as we have to do changes again. And this time we thought, that maybe someone else can benefit from such improvement.
I'm curious if described feature would be of any use to someone else. If yes, and if you want me to share all the details, please let me know.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
We use CouchDb as our backend storage since 2014. Back in 2018, some of our clients started to complain about problems related to delays for various db related operations. The problem was that we create/modify a number of design documents at runtime, which in turn causes re-indexing of the documents. When db has at least hundreds of thousands of documents, this might take some time, and unfortunately customers are not patient enough :)
We started to think, how to solve this, especially for our specific cases, when such design doc changes affected only a small subset of docs, where _id has specific prefix, e.g. "List-" or "App-". What we thought was: why parse the whole document, and later discard it in map function, when such discard could be done earlier? We don't really know Erlang, but with quick jump into tutorials, we were able to accomplish the task and build custom CouchDb, with this feature enabled. Similar changes were introductd to Dreyfus, to increase performance of full text indexing.
To accomplish the above, a new field _idfilter (with regular expression value) was introduced in design document. If this field is present and not null, then in LoadDoc function (couch_index_updater.erl), regex test is executed on document _id. Only if match is found, document is further processed.
This works in production since then (almost 3 years), significantly increasing indexing speed. We quite forgot about this change, but soon we're going to make 2.x -> 3.x transition, and the subject was brought back, as we have to do changes again. And this time we thought, that maybe someone else can benefit from such improvement.
I'm curious if described feature would be of any use to someone else. If yes, and if you want me to share all the details, please let me know.
Best regards,
Arek
Beta Was this translation helpful? Give feedback.
All reactions