Device search#146
Conversation
|
This addresses issue #146 |
|
@vkhoroz - no big rush here. I let Claude take a stab at this today and I was pretty pleased with its results. |
82d89d4 to
280ad9b
Compare
Poking around projects I like, they tend to give you a list of search operands to avoid the pitfall of exposing SQL implementation details to the API. In order to do search, we have to move away from using a prepared statement. The performance difference should be negligible. Assisted-by: GitHub Copilot:claude-4-opus Signed-off-by: Andy Doan <andy@foundries.io>
Allows you to search with query like: ``` GET /v1/devices?filter=name.eq.device-1 ``` Assisted-by: GitHub Copilot:claude-4-opus Signed-off-by: Andy Doan <andy@foundries.io>
Assisted-by: GitHub Copilot:claude-4-opus Signed-off-by: Andy Doan <andy@foundries.io>
This is the bare-minimum. We'll add some UX improvements to assist in creating the filter in follow-up commits Assisted-by: GitHub Copilot:claude-4-opus Signed-off-by: Andy Doan <andy@foundries.io>
Assisted-by: GitHub Copilot:claude-4-opus Signed-off-by: Andy Doan <andy@foundries.io>
|
@vkhoroz - i've rebased on your latest changes. take a peak and play with this. |
| for _, l := range labels { | ||
| s.WriteString("&label=" + template.URLQueryEscaper(l)) | ||
| } | ||
| return template.URL(s.String()) |
vkhoroz
left a comment
There was a problem hiding this comment.
It looks quite interesting.
I'm almost approving this; but imho some things need shuffling around to decrease future maintenance costs.
| uuid, created_at, last_seen, target_name, tag, is_prod, json(labels) | ||
| FROM devices | ||
| WHERE deleted=false | ||
| ORDER BY %s LIMIT ? OFFSET ?`, orderBy), |
There was a problem hiding this comment.
I think we should still use prepared statements for the basic (non-filtered) use case, which will be used in 80-95% of cases.
Fortunately, an implementation with prepared statements could be easily extended.
For example, changing the last two lines of the above statement gives you exactly what you need:
WHERE deleted=false %s
ORDER BY %s LIMIT ? OFFSET ?`, filterBy, orderBy),
Then, basic statements (where filterBy is empty) will be initialized just as before, while filtered queries will call Init+run ad-hoc.
Again, this already improves the non-filtered (most used) queries. But not only that....
That slightly more structured implementation allows to easily add a caching layer for prepared filtered statements in the future. Alas, there is a finite number of "filtering" combinations, so small that I bet their lazy initialization can easily fit into as small map as hundreds (or even dozens).
(If you wonder about a caching key - it could be e.g. a sha1(filterBy+orderBy).
| // buildLabelFilterSQL produces SQL WHERE clause fragments and args for label filters. | ||
| // Each filter uses the JSONB extract operator (labels ->> '$.<key>') with parameterized values. | ||
| // For "name" and "group" labels, it uses the indexed virtual columns directly. | ||
| func buildLabelFilterSQL(filters []LabelFilter) (string, []any, error) { |
There was a problem hiding this comment.
I think we need to move LabelFilter and buildLabelFilterSQL into a separate module like e.g. db_filter.go.
They are generic enough that eventually we'd like to extend them to non-label and/or non-device queries.
| LabelCmpEqual LabelComparison = "eq" | ||
| LabelCmpNotEqual LabelComparison = "ne" | ||
| LabelCmpContains LabelComparison = "contains" | ||
| LabelCmpNotContains LabelComparison = "ncontains" |
There was a problem hiding this comment.
Just IMHO, from a lexical perspective we can use any character sequences for these comparisons.
From a usability perspective, operands like name = abcd, group != foo, group ~= f*, name ~!= a?cd are more intuitive than, and require as much backend coding as name.eq.abcd, group.ne.foo, group.contains.f*, name.ncontains.a?cd.
We can actually use words at the API level (iiuc easier to URL-escape), but I'd much prefer to use more intuitive querying at the UX level.
Another option is to wrap entire filter into e.g. base32 encoding before calling API, in which case anything can be inside it.
fwiw your contains really seems to work as like i.e. allows for a true pattern matching, in line with the widespread ~= use and unlike a limited contains meaning.
Also, I'm a bit skeptical about the ncontains thing. Is this really what we want to support (i.e. are there real use cases for this)? If it is just to satisfy a rule of 4 - I'd remove it, as that kind of search is often associated with performance implications.
There was a problem hiding this comment.
we can change the filtering approach. I used EC2 GUI as something to look at. They include the same 4 operands and manage servers on the order of most people will have for devices. I'm testing with 10k devices and the time of the searches are imperceptible. I didn't see github or ec2 doing something like name = a?cd. They seem to lean on "contains" and "not contains" rather than taking on this complexity.
EC2's API is a little more like you refer to. eg ;tag:Name=!:og is a search for VMs who's tag "Name" does not contain "og". Searching a not equal felt ugly tag:Name=!%5C=og.
Pick a query language you want and I'll do it. The current one just seemed the easiest to write. That said, I'm just going to take your input and tell Claude to sort it out.
Some other options(%3d is urlencode of =) that are basically querying on "name equal" and "group not equal":
Messy to parse: filter=name%3Dfoo,group!%3Dbar
Close to AWS: tag:name=foo&tag:group=!%3Dbar
There was a problem hiding this comment.
I think I made a wrong emphasize.
I don't quite care about how the API or URL is structured, but more about how the user input is.
I was proposing simple =, !=, ~= because that's how GCP console does search (e.g. for logging at al), and that's the most naive syntax any engineer understands, without a need to remember fancy keywords.
As for the URL itself, I'd simply make it opaque, e.g. using base32 for the entire filter query. I see no value in sending a filter verbatim in the URL. Often, APIs wrap the entire form data into a single &state=abbracadabbraunreadablebase32pieceofjunk kind of a thing, and tbh I like it so much more than all those fancy notations which make a life of a programmer harder by trying to be pseudo-"compliant".
There was a problem hiding this comment.
I'm not against its idea and nomenclature, but GCP log search is quite a different beast than device search; at least if you compare to things like EC2 and GCP compute. GCP compute and github issues don't even really support search - its basically exact label matching. It makes me worry we are verging on overkill. Except, GCP compute and github issue search infuriates me.
I think the middle ground here would be to have the search be something entered into a text box (and also have a query helper like I've done) display something like name = foo group != bar location ~= blah location !~ to cover equal, not equal, contains, and not contains. We can take that string and urlencode it to something like: ?filter=name%3Dfoo+group%21%3Dbar+location~%3Dblah
There was a problem hiding this comment.
On our last comments we seem to be stating almost the same thing about a "potential middle ground syntax".
If you wish to url-encode instead of base32, I'm just fine.
Three things I would bother specifically:
- I don't see a use case for "not contains"; even "not equal" looks rather synthetic, but at least intuitive and marginally valuable.
- We do allow a dot inside label values (good, not need to change) - that's something to take in account when we talk about the syntax.
- Visual comfort of the user, i.e. seeing familiar things without clobbering.
I bet nothing is more familiar than a school-level equality/inequality operators; and also school level (albeit less common) equivalence/approximation (~). These are also widely used by JIRA/GCP/Databases, while GitHub/AWS took their unique paths.
There was a problem hiding this comment.
fwiw if we follow the AWS style, we might get into the licensing trouble. I'm not sure if their specific syntax is a common knowledge or patent-protected. Just something to keep in mind.
There was a problem hiding this comment.
note: I'm not following AWS style - I found it confusing. I'm suggesting to support the same search operations as them. I've found them handy with my large fleet and surely they invested in small fortune in researching what would satisfy their users.
| } | ||
|
|
||
| // validLabelKey ensures label keys only contain safe characters for JSON path embedding. | ||
| var validLabelKey = regexp.MustCompile(`^[a-zA-Z0-9_\-]+$`) |
There was a problem hiding this comment.
I'd really wish to allow *? characters for the match operators, where we use LIKE internally, but still require the first character to be non-glob.
There was a problem hiding this comment.
its interesting. I just tried to get something on par with EC2. Let's get the simple thing going and see where that takes us.
There was a problem hiding this comment.
This makes sense.
I was just thinking: why got for EC2 way if FoundriesFactory way was already reacher? Why not GCP way? Or why not sqlite way?
That said, feel free to ignore. Contains is already powerful enough to solve 99% of use cases.
| case "group": | ||
| col = "group_name" | ||
| default: | ||
| col = fmt.Sprintf("labels ->> '$.%s'", f.Label) |
There was a problem hiding this comment.
One other thing I'd like to support is uuid. It is not a label, but I often used it when debugging customer problems.
I do believe a tag field is yet another good example for filtering.
There was a problem hiding this comment.
you don't need to search for a UUID though?
There was a problem hiding this comment.
Customer tickets usually contained a uuid or a set of uuids.
But, sometimes there were typos, and in cases like that I was searching by the uuid start.
| parts := strings.SplitN(p, ".", 3) | ||
| if len(parts) != 3 { | ||
| return nil, fmt.Errorf("invalid label filter %q: expected format key.comparison.value", p) | ||
| } | ||
| f := storage.LabelFilter{ | ||
| Label: parts[0], | ||
| Comparison: storage.LabelComparison(parts[1]), | ||
| Value: parts[2], | ||
| } | ||
| if err := f.Validate(); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
I think it would be beneficial to use some lexer, e.g. the participle library which allows defining the LabelFilter as below.
TBH. I'm not sure how much benefit any lexer would give us. A simple split (as you did) seems sufficient to me, and in case if we switch to =|~=|!= kind of operators, a simple split would work well too.
var filterLexer = lexer.MustSimple([]lexer.SimpleRule{
{Name: "Label", Pattern: `^[\w_]+`},
{Name: "Operator", Pattern: `.eq.|.ne.|.contains.|.ncontains.`},
{Name: "Value", Pattern: `[\w\.\-_]+$`},
})
type LabelFilter struct {
Label string `parser:"@Label"`
Comparison LabelComparison `parser:"@Operator"`
Value string `parser:"@Value"`
}
// global initialization somewhere
parser, err := participle.Build[LabelFilter](participle.Lexer(filterLexer))
if err != nil { panic(err) }
// usage
input := "name.contains.abcd"
filter, err := parser.ParseString("", input)
| e.preventDefault(); | ||
| var labels = labelsInput.value.trim(); | ||
| var params = new URLSearchParams(window.location.search); | ||
| [...params.keys()].forEach(k => { if (k == "label") params.delete(k); }); |
There was a problem hiding this comment.
So, IIUC this deletes filter params from the existing window URL, right?
This deserves a comment I think.
| }); | ||
| } | ||
| // Keep other params (sort, page, etc) | ||
| window.location.search = params.toString(); |
There was a problem hiding this comment.
Hmm.... this is going to spam our API on each keydown.
IIRC there is a simple way to place a small delay to the event handler, so that the actual API call is only made after there was no subsequent keydown event for the next e.g. 300 or 500 milliseconds.
That's still responsive enough, but does not constantly refresh the page (both performance killer and UX flaw).
| <th class="sortable"> | ||
| {{ if eq .Sort "uuid-asc" }} | ||
| <a href="/devices?sort=uuid-desc" title="Sorted ascending, click for descending">UUID <span class="sort-active">▲</span></a> | ||
| <a href="/devices?sort=uuid-desc{{labelParams .LabelFilters}}" title="Sorted ascending, click for descending">UUID <span class="sort-active">▲</span></a> |
There was a problem hiding this comment.
IIRC there is a way to hijack the page request and inject {{labelParams .LabelFilters}} on the fly, so that we don't need to remember copying it in so many places.
Smth along these lines:
document.addEventListener('click', function(e) {
const link = e.target.closest('a');
if (link && link.href && link.href.startsWith('/devices?')) {
e.preventDefault();
window.location.href = link.href+{{labelParams .LabelFilters}};
}
});
It goes with the trade-off (truncated link in browser footer), but it saves quite some hurdle.
| }); | ||
| if (labelsArr.length > 0) { | ||
| labelsInput.value = labelsArr.join(' '); | ||
| (function() { |
There was a problem hiding this comment.
This auto-completion is quite a thing, but I am scared by the maintenance cost.
Do you think this is what we really need at this stage?
Maybe, a simple help message above the filter input would do a better work?
There was a problem hiding this comment.
Claude can do the maintenance. This is too useful. I can't remember how to filter each time I use this. Depending how we do the query implementation, this might become even more needed.


Add the ability to filter/search for devices by their labels when listing.
It builds on the idea of filters that look like
<label>.<compartor>.<value>. So to filter for devices who's name contains "foo", you sayname.contains.foo. These can be ANDed together for a query like:name.contains.19 name.contains.2which could return devices with names like192,219,1932, etc.The auto-completion logic in the UI makes it a little easier to use for a first timer.