Skip to content

Device search#146

Open
doanac wants to merge 5 commits into
mainfrom
device-search
Open

Device search#146
doanac wants to merge 5 commits into
mainfrom
device-search

Conversation

@doanac
Copy link
Copy Markdown
Member

@doanac doanac commented Apr 23, 2026

Add the ability to filter/search for devices by their labels when listing.

It builds on the idea of filters that look like <label>.<compartor>.<value>. So to filter for devices who's name contains "foo", you say name.contains.foo. These can be ANDed together for a query like: name.contains.19 name.contains.2 which could return devices with names like 192, 219, 1932, etc.

The auto-completion logic in the UI makes it a little easier to use for a first timer.

@doanac
Copy link
Copy Markdown
Member Author

doanac commented Apr 23, 2026

This addresses issue #146

@doanac doanac requested a review from vkhoroz April 23, 2026 18:37
@doanac
Copy link
Copy Markdown
Member Author

doanac commented Apr 23, 2026

@vkhoroz - no big rush here. I let Claude take a stab at this today and I was pretty pleased with its results.

@doanac
Copy link
Copy Markdown
Member Author

doanac commented Apr 23, 2026

Screenshots:

no filter applied

image

two filters applied

image

i can't catch a screenshot of autocomplete

Comment thread storage/api/api_storage.go Outdated
@doanac doanac force-pushed the device-search branch 2 times, most recently from 82d89d4 to 280ad9b Compare April 30, 2026 15:06
doanac added 5 commits May 14, 2026 13:26
Poking around projects I like, they tend to give you a list of search
operands to avoid the pitfall of exposing SQL implementation details to
the API.

In order to do search, we have to move away from using a prepared
statement. The performance difference should be negligible.

Assisted-by: GitHub Copilot:claude-4-opus
Signed-off-by: Andy Doan <andy@foundries.io>
Allows you to search with query like:
```
 GET /v1/devices?filter=name.eq.device-1
```

Assisted-by: GitHub Copilot:claude-4-opus
Signed-off-by: Andy Doan <andy@foundries.io>
Assisted-by: GitHub Copilot:claude-4-opus
Signed-off-by: Andy Doan <andy@foundries.io>
This is the bare-minimum. We'll add some UX improvements to assist in
creating the filter in follow-up commits

Assisted-by: GitHub Copilot:claude-4-opus
Signed-off-by: Andy Doan <andy@foundries.io>
Assisted-by: GitHub Copilot:claude-4-opus
Signed-off-by: Andy Doan <andy@foundries.io>
@doanac
Copy link
Copy Markdown
Member Author

doanac commented May 14, 2026

@vkhoroz - i've rebased on your latest changes. take a peak and play with this.

for _, l := range labels {
s.WriteString("&label=" + template.URLQueryEscaper(l))
}
return template.URL(s.String())
Comment thread storage/api/api_storage.go Dismissed
Copy link
Copy Markdown
Member

@vkhoroz vkhoroz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks quite interesting.

I'm almost approving this; but imho some things need shuffling around to decrease future maintenance costs.

uuid, created_at, last_seen, target_name, tag, is_prod, json(labels)
FROM devices
WHERE deleted=false
ORDER BY %s LIMIT ? OFFSET ?`, orderBy),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still use prepared statements for the basic (non-filtered) use case, which will be used in 80-95% of cases.
Fortunately, an implementation with prepared statements could be easily extended.

For example, changing the last two lines of the above statement gives you exactly what you need:

		WHERE deleted=false %s
		ORDER BY %s LIMIT ? OFFSET ?`, filterBy, orderBy),

Then, basic statements (where filterBy is empty) will be initialized just as before, while filtered queries will call Init+run ad-hoc.

Again, this already improves the non-filtered (most used) queries. But not only that....
That slightly more structured implementation allows to easily add a caching layer for prepared filtered statements in the future. Alas, there is a finite number of "filtering" combinations, so small that I bet their lazy initialization can easily fit into as small map as hundreds (or even dozens).
(If you wonder about a caching key - it could be e.g. a sha1(filterBy+orderBy).

// buildLabelFilterSQL produces SQL WHERE clause fragments and args for label filters.
// Each filter uses the JSONB extract operator (labels ->> '$.<key>') with parameterized values.
// For "name" and "group" labels, it uses the indexed virtual columns directly.
func buildLabelFilterSQL(filters []LabelFilter) (string, []any, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to move LabelFilter and buildLabelFilterSQL into a separate module like e.g. db_filter.go.
They are generic enough that eventually we'd like to extend them to non-label and/or non-device queries.

LabelCmpEqual LabelComparison = "eq"
LabelCmpNotEqual LabelComparison = "ne"
LabelCmpContains LabelComparison = "contains"
LabelCmpNotContains LabelComparison = "ncontains"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just IMHO, from a lexical perspective we can use any character sequences for these comparisons.
From a usability perspective, operands like name = abcd, group != foo, group ~= f*, name ~!= a?cd are more intuitive than, and require as much backend coding as name.eq.abcd, group.ne.foo, group.contains.f*, name.ncontains.a?cd.

We can actually use words at the API level (iiuc easier to URL-escape), but I'd much prefer to use more intuitive querying at the UX level.
Another option is to wrap entire filter into e.g. base32 encoding before calling API, in which case anything can be inside it.

fwiw your contains really seems to work as like i.e. allows for a true pattern matching, in line with the widespread ~= use and unlike a limited contains meaning.

Also, I'm a bit skeptical about the ncontains thing. Is this really what we want to support (i.e. are there real use cases for this)? If it is just to satisfy a rule of 4 - I'd remove it, as that kind of search is often associated with performance implications.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can change the filtering approach. I used EC2 GUI as something to look at. They include the same 4 operands and manage servers on the order of most people will have for devices. I'm testing with 10k devices and the time of the searches are imperceptible. I didn't see github or ec2 doing something like name = a?cd. They seem to lean on "contains" and "not contains" rather than taking on this complexity.

EC2's API is a little more like you refer to. eg ;tag:Name=!:og is a search for VMs who's tag "Name" does not contain "og". Searching a not equal felt ugly tag:Name=!%5C=og.

Pick a query language you want and I'll do it. The current one just seemed the easiest to write. That said, I'm just going to take your input and tell Claude to sort it out.

Some other options(%3d is urlencode of =) that are basically querying on "name equal" and "group not equal":

Messy to parse: filter=name%3Dfoo,group!%3Dbar
Close to AWS: tag:name=foo&tag:group=!%3Dbar

Copy link
Copy Markdown
Member

@vkhoroz vkhoroz May 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I made a wrong emphasize.

I don't quite care about how the API or URL is structured, but more about how the user input is.

I was proposing simple =, !=, ~= because that's how GCP console does search (e.g. for logging at al), and that's the most naive syntax any engineer understands, without a need to remember fancy keywords.

As for the URL itself, I'd simply make it opaque, e.g. using base32 for the entire filter query. I see no value in sending a filter verbatim in the URL. Often, APIs wrap the entire form data into a single &state=abbracadabbraunreadablebase32pieceofjunk kind of a thing, and tbh I like it so much more than all those fancy notations which make a life of a programmer harder by trying to be pseudo-"compliant".

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not against its idea and nomenclature, but GCP log search is quite a different beast than device search; at least if you compare to things like EC2 and GCP compute. GCP compute and github issues don't even really support search - its basically exact label matching. It makes me worry we are verging on overkill. Except, GCP compute and github issue search infuriates me.

I think the middle ground here would be to have the search be something entered into a text box (and also have a query helper like I've done) display something like name = foo group != bar location ~= blah location !~ to cover equal, not equal, contains, and not contains. We can take that string and urlencode it to something like: ?filter=name%3Dfoo+group%21%3Dbar+location~%3Dblah

Copy link
Copy Markdown
Member

@vkhoroz vkhoroz May 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On our last comments we seem to be stating almost the same thing about a "potential middle ground syntax".
If you wish to url-encode instead of base32, I'm just fine.

Three things I would bother specifically:

  • I don't see a use case for "not contains"; even "not equal" looks rather synthetic, but at least intuitive and marginally valuable.
  • We do allow a dot inside label values (good, not need to change) - that's something to take in account when we talk about the syntax.
  • Visual comfort of the user, i.e. seeing familiar things without clobbering.
    I bet nothing is more familiar than a school-level equality/inequality operators; and also school level (albeit less common) equivalence/approximation (~). These are also widely used by JIRA/GCP/Databases, while GitHub/AWS took their unique paths.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw if we follow the AWS style, we might get into the licensing trouble. I'm not sure if their specific syntax is a common knowledge or patent-protected. Just something to keep in mind.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I'm not following AWS style - I found it confusing. I'm suggesting to support the same search operations as them. I've found them handy with my large fleet and surely they invested in small fortune in researching what would satisfy their users.

}

// validLabelKey ensures label keys only contain safe characters for JSON path embedding.
var validLabelKey = regexp.MustCompile(`^[a-zA-Z0-9_\-]+$`)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really wish to allow *? characters for the match operators, where we use LIKE internally, but still require the first character to be non-glob.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its interesting. I just tried to get something on par with EC2. Let's get the simple thing going and see where that takes us.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense.

I was just thinking: why got for EC2 way if FoundriesFactory way was already reacher? Why not GCP way? Or why not sqlite way?

That said, feel free to ignore. Contains is already powerful enough to solve 99% of use cases.

case "group":
col = "group_name"
default:
col = fmt.Sprintf("labels ->> '$.%s'", f.Label)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thing I'd like to support is uuid. It is not a label, but I often used it when debugging customer problems.
I do believe a tag field is yet another good example for filtering.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to search for a UUID though?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customer tickets usually contained a uuid or a set of uuids.
But, sometimes there were typos, and in cases like that I was searching by the uuid start.

Comment on lines +80 to +91
parts := strings.SplitN(p, ".", 3)
if len(parts) != 3 {
return nil, fmt.Errorf("invalid label filter %q: expected format key.comparison.value", p)
}
f := storage.LabelFilter{
Label: parts[0],
Comparison: storage.LabelComparison(parts[1]),
Value: parts[2],
}
if err := f.Validate(); err != nil {
return nil, err
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be beneficial to use some lexer, e.g. the participle library which allows defining the LabelFilter as below.

TBH. I'm not sure how much benefit any lexer would give us. A simple split (as you did) seems sufficient to me, and in case if we switch to =|~=|!= kind of operators, a simple split would work well too.

var filterLexer = lexer.MustSimple([]lexer.SimpleRule{
	{Name: "Label", Pattern: `^[\w_]+`},
	{Name: "Operator", Pattern: `.eq.|.ne.|.contains.|.ncontains.`}, 
	{Name: "Value", Pattern: `[\w\.\-_]+$`},
})

type LabelFilter struct {
	Label      string          `parser:"@Label"`
	Comparison LabelComparison `parser:"@Operator"`
	Value      string          `parser:"@Value"`
}

// global initialization somewhere
parser, err := participle.Build[LabelFilter](participle.Lexer(filterLexer))
if err != nil { panic(err) }

// usage
input := "name.contains.abcd"
filter, err := parser.ParseString("", input)

e.preventDefault();
var labels = labelsInput.value.trim();
var params = new URLSearchParams(window.location.search);
[...params.keys()].forEach(k => { if (k == "label") params.delete(k); });
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, IIUC this deletes filter params from the existing window URL, right?
This deserves a comment I think.

});
}
// Keep other params (sort, page, etc)
window.location.search = params.toString();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... this is going to spam our API on each keydown.

IIRC there is a simple way to place a small delay to the event handler, so that the actual API call is only made after there was no subsequent keydown event for the next e.g. 300 or 500 milliseconds.
That's still responsive enough, but does not constantly refresh the page (both performance killer and UX flaw).

<th class="sortable">
{{ if eq .Sort "uuid-asc" }}
<a href="/devices?sort=uuid-desc" title="Sorted ascending, click for descending">UUID <span class="sort-active">▲</span></a>
<a href="/devices?sort=uuid-desc{{labelParams .LabelFilters}}" title="Sorted ascending, click for descending">UUID <span class="sort-active">▲</span></a>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC there is a way to hijack the page request and inject {{labelParams .LabelFilters}} on the fly, so that we don't need to remember copying it in so many places.

Smth along these lines:

document.addEventListener('click', function(e) {
  const link = e.target.closest('a');

  if (link && link.href && link.href.startsWith('/devices?')) {
    e.preventDefault();
    window.location.href = link.href+{{labelParams .LabelFilters}};
  }
});

It goes with the trade-off (truncated link in browser footer), but it saves quite some hurdle.

});
if (labelsArr.length > 0) {
labelsInput.value = labelsArr.join(' ');
(function() {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This auto-completion is quite a thing, but I am scared by the maintenance cost.
Do you think this is what we really need at this stage?
Maybe, a simple help message above the filter input would do a better work?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude can do the maintenance. This is too useful. I can't remember how to filter each time I use this. Depending how we do the query implementation, this might become even more needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants