How to match all word pairs (with overlap/look-behind)? #2189
-
I've list of words and I would like to capture all word pairs (including these between the matches). I've tried the following:
however it doesn't capture all of them (I expect 3 lines in total, not 2). I've tried to use |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
It seems to me like if you want all word pairs, then you necessarily must also have the ability for matches to overlap. PCRE almost seems like it could give you that with look-around, but look-around doesn't actually change the bounds of the match itself. Overlapping matches are typically not supported by most general purpose regex engines. Despite my heavy involvement in implementing regex engines, I am not particularly creative in their use and haven't spent a lot of time with PCRE features. So I think either someone else will need to answer this (if there's even a solution) or this question would probably be better asked somewhere where there are more PCRE folks. I am working on adding support for overlapping matches to ripgrep's underlying regex engine. (Albeit, with no specific intent on exposing that in ripgrep proper.) So I decided to see what an answer to this would look like. The main problem with overlapping regexes is that they can be really weird. So for example, just applying your regex in overlapping mode doesn't quite give what you would expect:
That's because
Yup, that looks right! (The |
Beta Was this translation helpful? Give feedback.
-
I've found the following workaround by printing missing pairs manually (in use case when they go together in 4 pairs):
but with more words, it'll keep skipping the pairs along the edges (after This is with further improvements:
but it's not perfect either (skipping the last pair) and giving one duplicate. But it's better than nothing. There is still area for improvements. |
Beta Was this translation helpful? Give feedback.
It seems to me like if you want all word pairs, then you necessarily must also have the ability for matches to overlap. PCRE almost seems like it could give you that with look-around, but look-around doesn't actually change the bounds of the match itself. Overlapping matches are typically not supported by most general purpose regex engines.
Despite my heavy involvement in implementing regex engines, I am not particularly creative in their use and haven't spent a lot of time with PCRE features. So I think either someone else will need to answer this (if there's even a solution) or this question would probably be better asked somewhere where there are more PCRE folks.
I am working on addin…