-
Notifications
You must be signed in to change notification settings - Fork 12
Add neco-server-exporter to expose custom node-local metrices #1753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4574776 to
65f5f30
Compare
4e7564b to
57d1bc0
Compare
Signed-off-by: Daichi Sakaue <[email protected]>
57d1bc0 to
573b19e
Compare
|
NOTE: merge this PR after preparing ghcr.io |
aca806e to
4f2916e
Compare
Signed-off-by: Daichi Sakaue <[email protected]>
Signed-off-by: Daichi Sakaue <[email protected]>
Signed-off-by: Daichi Sakaue <[email protected]>
4f2916e to
4c96d64
Compare
| return nil | ||
| } | ||
|
|
||
| func (e *Exporter) Run(ctx context.Context) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that Run() never return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to leave it as is, because it states the Run function may return error in future development.
Without it, someone may confuse how to treat errors happen in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. How about writing it in a comment to left what you replied?
For example...
// Run starts the metric collection loop.
// The error return is reserved for future fatal error conditions.
// Currently always returns nil except when context is cancelled.
func (e *Exporter) Run(ctx context.Context) error {There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I will add it.
| } | ||
| } | ||
|
|
||
| func (e *Exporter) AddCollector(c Collector) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: How about making collectors a constructor parameter instead of using AddCollector(). This would remove running and make the API simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool idea, thank you!
| if err := http.ListenAndServe("0.0.0.0:8080", nil); err != nil { | ||
| e.log.Error("metrics server stopped", slog.Any("error", err)) | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This HTTP server doesn't stop when ctx is cancelled. How about using http.Server.Shutdown() when ctx is done for graceful shutdown?
| "github.com/cilium/cilium/pkg/client" | ||
| "github.com/cilium/ebpf" | ||
| "github.com/cybozu/neco-containers/neco-server-exporter/pkg/exporter" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "github.com/cilium/cilium/pkg/client" | |
| "github.com/cilium/ebpf" | |
| "github.com/cybozu/neco-containers/neco-server-exporter/pkg/exporter" | |
| "github.com/cilium/cilium/pkg/client" | |
| "github.com/cilium/ebpf" | |
| "github.com/cybozu/neco-containers/neco-server-exporter/pkg/exporter" |
Missing blank line between external and internal imports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This point is new to me, because the proposed style is not currently adopted by our team.
- Codes dividing external and internal imports
- Codes not doing so
My feeling is neutral for this and it's ok to accept it. However, it's better to automate.
I'm going to install goimports in tools.mk and check the style in make check-generate.
neco-server-exporter/main.go
Outdated
| "github.com/cybozu/neco-containers/neco-server-exporter/pkg/collector/bpf" | ||
| "github.com/cybozu/neco-containers/neco-server-exporter/pkg/exporter" | ||
| "github.com/spf13/cobra" | ||
| ctrl "sigs.k8s.io/controller-runtime" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
neco-server-exporter/e2e/Makefile
Outdated
| uninstall: | ||
| $(KUBECTL) delete -f testdata/namespace.yaml | ||
| -docker image rm $(IMAGE_TAG) | ||
| -docker image prune -f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it enough docker image rm $(IMAGE_TAG)?
image prune seems little dangerous to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's consistent with other scripts.
Do you want all of them to be fixed?
$ git grep -F -- '-docker image prune -f'
cep-checker/e2e/Makefile:43: -docker image prune -f
neco-server-exporter/e2e/Makefile:57: -docker image prune -f
squid-exporter/e2e/Makefile:34: -docker image prune -f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think other ones could be fixed as well, but for now, I think only this one is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let me clarify my implication, I think it's very important to choose one from the following options.
- Fix all-at-once.
- Leave it as negligible.
If we fix the problem partially, the remaining ones will be copy-pasted sometime and we need to point out the same problem again and again. That's not productive.
I would like to ask that the problem is worth fixing for paying the all-at-once cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. What do you think about image prune?
If you also think it's a problem, let's fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my opinion it looks a little dangerous too.
It's not intuitive that make uninstall prunes all the unrelated images.
My preference is to remove automatic image prune from the entire repository.
|
|
||
| // Please uncomment when needed | ||
|
|
||
| // func kubectl(input []byte, args ...string) ([]byte, []byte, error) { | ||
| // return runCommand(kubectlPath, input, args...) | ||
| // } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can remove these lines (I think we can add it when it's needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to leave it here, because the function will very likely be needed.
Otherwise, the one who wants kubectl will need to grep neco repositories all-around to get an appropriate version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, why don't you call kubectl() from kubectlSafe()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tiny optimization to save call-stack memory that's not-so-effective (~10ns/32Bytes).
I'll use kubectl() from kubectlSafe(), that's more rigid and readable.
| ret := make(map[ebpf.ProgramID]TCXMetadata) | ||
| for it.Next() { | ||
| li := it.Take() | ||
| defer li.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be close end of for or extract it as a function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not a major problem, but will fix the implementation to close the handle per-program to avoid consuming thousands of file descriptors caused by a large number of BPF programs.
|
|
||
| type Collector interface { | ||
| // Metrics names will be "neco_server_<SectionName>_<MetricsName>{MetricsLabels}". | ||
| SectionName() string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think just Name() would be fine too, since it's clear that it's the collector's(section?) name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to use SectionName, because it flavors like something special is there.
Name looks too concise to find out it's used as a part of metrics names.
Someone may think it's ok to set arbitrary text here and write as "BPF Performance Exporter".
BTW how about MetricsGroupName()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricsGroupName() is fine. Also MetricsPrefix is clear to me in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I'm going to use MetricsPrefix.
Thanks for the suggestion 🙇
| @@ -0,0 +1,120 @@ | |||
| module github.com/cybozu/neco-containers/neco-server-exporter | |||
|
|
|||
| go 1.24.5 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: 1.24.5 has some vulnerabilities. I think the latest would be good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version should exactly match with the workflow:
https://github.com/cybozu/neco-containers/blob/main/.github/workflows/main.yaml#L281
https://github.com/cybozu/neco-containers/actions/runs/19063539929/job/54448614430
Do you think it's ok to postpone to regular update, or better to update all for now?
$ git grep -F 1.24.5
.github/workflows/main.yaml:285: go-version: "1.24.5"
admission/go.mod:3:go 1.24.5
bmc-log-collector/go.mod:3:go 1.24.5
cep-checker/go.mod:3:go 1.24.5
envoy/go.mod:3:go 1.24.5
neco-server-exporter/go.mod:3:go 1.24.5
squid-exporter/go.mod:3:go 1.24.5
tcp-keepalive/go.mod:3:go 1.24.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, I think it's enough to just update this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's impossible to fix neco-server-exporter only, due to repository structure.
We must choose all-at-once or regular update for this problem.
427e2d0 to
8e2eb90
Compare
Signed-off-by: Daichi Sakaue <[email protected]>
8e2eb90 to
bde9ccf
Compare
Signed-off-by: Daichi Sakaue [email protected]