-
Notifications
You must be signed in to change notification settings - Fork 1.2k
enhance search API #3658
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
enhance search API #3658
Conversation
8b80291
to
92b6fba
Compare
I've rebased the PR onto master and added |
92b6fba
to
72fcf50
Compare
The latest commit fixes a subtle bug related to the padding of the search region: In the presence of combining characters, one could end up with an infinite loop. (Try searching backwards for This bug is also present in #3575, hence in master. If you want, I can backport 88f3cf5 to master. This would require some modification of the commit, so let me know if that's necessary. |
0d14eae
to
88f3cf5
Compare
88f3cf5
to
1c1a35a
Compare
I've force-pushed a polished version and updated the list of functions at the top of this page. It still fixes the bug mentioned above. Also, locations returned for matches and submatches are now guaranteed to include the runes that matched. The underlying Go regexp functions match runes, which may be part of combining characters like This PR introduces many new functions. Maybe we don't need all of them. For example, do we need a submatch version on top of each non-submatch search or replace function? If we keep only the submatch version, we wouldn't lose any functionality because the additional elements in the slice for each match can be ignored. I nevertheless added all functions to this PR to show what's possible. |
@dmaluka Any chance to move forward with this PR? Apart from a detailed review, a quick feedback would be helpful:
|
Haven't had much time to look into it yet, sorry. |
To see some of the new search functions in action, check out my LaTeX plugin. The way the new |
1c1a35a
to
8683a1a
Compare
In the latest version, the regexp for all new search and replace functions can be specified either as a |
I'm worried that we are exposing such implementation details as padded regexps as a part of the API. I think we should try and make it a bit more abstract and future-proof, e.g. something like:
And IMHO this implicit polymorphism is messy. I'm thinking of something like:
which internally use:
Seems reasonable. The caller can ignore the returned submatches if it doesn't need them.
Seems reasonable. The intuitively expected behavior would be: if And I'm not even sure why exactly we currently swap them. From looking at the code it seems like we already make sure to always pass |
Exactly. Another option would be to combine |
Yes, it would be too smart. |
@@ -113,25 +113,24 @@ func (eh *EventHandler) DoTextEvent(t *TextEvent, useUndo bool) { | |||
} | |||
|
|||
// ExecuteTextEvent runs a text event | |||
// The deltas are processed in reverse order and afterwards reversed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add an explanation why it is needed?
Also, this is rather an implementation detail, so maybe this comment should be inside the function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't say that it's an implementation detail. When you create a TextEvent
t
, you have to know if which order the elements of t.Deltas
are processed because that changes the meaning of the locations. To keep it less technical, we could say that the locations of the various Delta
s have to be (non-overlapping and) in increasing order. The old comment could then indeed move inside the function.
@@ -59,6 +59,20 @@ func init() { | |||
Stdout = new(bytes.Buffer) | |||
} | |||
|
|||
// RangeMap returns the slice obtained from applying the given function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not SliceMap
?
And I'm not sure we even need this helper. I see there is exactly one usage of it, and it doesn't look very convincing. And I'm not sure we should create a precedent of using generics unless we really find it useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I called it RangeMap
instead of SliceMap
because the function f
does not only receive the slice element, but also the position, as in a range
.
You are right, this helper function is not used elsewhere at present. Maybe the reason is that it needs type parameters, and before the recent bump from Go 1.17 to 1.19 we didn't have them. I myself am new to Go, and I find it annoying that such basic functionality is not included directly in Go. I'm sure that if we looked through the code for micro, we would find places where RangeMap
would be useful. I would be optimistic that there are other uses in the future. I'm using it in another PR that I haven't submitted yet because it depends on the present one. But it's up to you. If you want me to delete it, I'll do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I actually regret we bumped it from 1.17 to 1.19. We'd have an easy compelling answer to questions "why not use generics", "why not use any
", "why not use another shiny new feature X".
internal/buffer/search.go
Outdated
// ReplaceAllLiteral replaces all matches of the regexp `s` with `repl` in | ||
// the given area. The function returns the number of replacements made, the | ||
// new end position and any error that occured during regexp compilation | ||
func (b *Buffer) ReplaceAllLiteral(s string, start, end Loc, repl []byte) (int, Loc, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass literal as a boolean argument to ReplaceAll()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can change that. The reason I had chosen ReplaceAllLiteral
was to imitate Go's regexp
API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, didn't realize that. Anyway, we don't need to replicate Go's API precisely, we can define whatever API is more convenient for us to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I like about ReplaceAllLiteral
is that I don't have to remember whether a second argument true
to ReplaceAll
means "use as regexp template" or "use literally". (I believe that in almost all practical purposes, this argument would be a constant for each invocation of ReplaceAll
, not some variable whose value is not known in advance.)
func (l Loc) IsVoid() bool { | ||
return l == LocVoid() | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WTF is this, sorry.
I was about to suggest something like:
const InvalidLoc = Loc{-1, -1}
but then I recalled that Go doesn't support constant structs.
So I think we should just keep using directly Loc{-1, -1}
(and explicit checks like loc == Loc{-1, -1}
, without helpers), there's nothing terrible about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought first about constant structs.
The reason that I made this change in this PR is that the "internal" value Loc{-1, -1}
is now exposed to Lua scripts: If a submatch is not filled in a match, then we need a way to indicate that. An example would be searching for "a([xy])|b([uv])" in "ax". The first submatch would be "x" and the second one would be void. (In Go, the indices of a void submatch are -1
.) I thought that something like loc:IsVoid()
looks cleaner in in a Lua script.
Does this convince you, or do you still want me to remove LocVoid
and IsVoid
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the use of the word "void" here is very confusing, isn't it? Why "void"? (Now I understand it refers to this specific use case of "a submatch is not filled in a match", but how is a casual person supposed to guess that, and why limit the API to this narrow use case?)
What about just:
func (l Loc) IsValid() bool {
return l.X >= 0 && l.Y >= 0
}
?
The struct is a good idea. Are you attached to the name
I wonder how convenient the arguments
Such a function might cover most uses of |
That is exactly the kind of details that I'd prefer to hide, not expose. |
Fair enough. What about |
This PR is based on #3575
and therefore a draft at present. It has the following components:RegexpGroup
that combines a regexp with its padded versions as used in match beginning and end of line correctly #3575,Deltas
inExecuteTextEvent
in reverse order. This makesreplaceall
easier to implement,LocVoid()
andLoc.IsVoid()
to deal with unused submatches.The new types and functions are as follows (UPDATED):
The method
FindNext
is kept.ReplaceRegex
is removed in favor ofReplaceAll
. The latter is easier to use in Lua scripts.Currently the simple search functions (
FindDown
etc.) take aRegexpGroup
as argument to avoid recompiling the regexps. In contrast,FindAll
,ReplaceAll
and friends take a string argument. Many other variants would be possible. Also, the new search functions ignore theignorecase
setting of the buffer and don't wrap around when they hit the end of the search region. I think they are more useful this way in Lua scripts.You will see that many new internal functions use callback functions. This avoids code duplication. (One has to somehow switch between
(*regexp.Regexp).FindIndex()
and(*regexp.Regexp).FindSubmatchIndex()
in the innermost function that searches each line of the buffer.)As said before, many details could be modified, but overall I think these functions are very useful for writing scripts. Please let me know what you think.