Skip to content

feat: /heath rpc endpoint #928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

distractedm1nd
Copy link
Contributor

Would close #739 ?

If I've understood correctly, the /health endpoint is just for checking if the rpc is running - not sure if this is still wanted/necessary though, feel free to delete.

Should more health endpoints be created for any other services? For example, with the new DASState, the /daser endpoint seems like a great "health check" to me

@distractedm1nd distractedm1nd changed the title feat: /heath rpc endpoint feat: /heath rpc endpoint Jul 21, 2022
@vgonkivs vgonkivs added area:rpc kind:feat Attached to feature PRs labels Jul 21, 2022
Copy link
Member

@renaynay renaynay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@distractedm1nd looks great so far - maybe we can extend the /health response to also have a map of services -> running/not running

@distractedm1nd
Copy link
Contributor Author

distractedm1nd commented Jul 21, 2022

type HealthResponse struct {
	StateService  bool             `json:"state_service"`
	ShareService  bool             `json:"share_service"`
	DASStatus     DasStateResponse `json:"das_status"`
	HeaderSyncing bool             `json:"header_syncing"`
}

@renaynay were you thinking something like this?

Thoughts:

  • should it really contain the das state? Or should i also just add a IsStopped like the other services?
  • took out the "status": "ok", not sure if that is good or bad yet
  • used header syncing, since the header service looks to be "always on"?

@codecov-commenter
Copy link

Codecov Report

Merging #928 (8871162) into main (8d56e55) will decrease coverage by 0.12%.
The diff coverage is 17.85%.

@@            Coverage Diff             @@
##             main     #928      +/-   ##
==========================================
- Coverage   58.74%   58.61%   -0.13%     
==========================================
  Files         128      129       +1     
  Lines        7623     7650      +27     
==========================================
+ Hits         4478     4484       +6     
- Misses       2676     2703      +27     
+ Partials      469      463       -6     
Impacted Files Coverage Δ
service/rpc/health.go 0.00% <0.00%> (ø)
service/share/share.go 81.48% <66.66%> (+0.46%) ⬆️
service/rpc/endpoints.go 100.00% <100.00%> (ø)
fraud/pb/proof.pb.go 31.36% <0.00%> (-2.15%) ⬇️
ipld/retriever.go 92.30% <0.00%> (+1.92%) ⬆️
ipld/nmt_adder.go 92.00% <0.00%> (+24.00%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us.

@renaynay
Copy link
Member

renaynay commented Jul 22, 2022

type HealthResponse struct {
	StateService  bool             `json:"state_service"`
	ShareService  bool             `json:"share_service"`
	DASStatus     DasStateResponse `json:"das_status"`
	HeaderSyncing bool             `json:"header_syncing"`
}

@renaynay were you thinking something like this?

Thoughts:

  • should it really contain the das state? Or should i also just add a IsStopped like the other services?
  • took out the "status": "ok", not sure if that is good or bad yet
  • used header syncing, since the header service looks to be "always on"?

@distractedm1nd lets do something like (excuse formatting):

type HealthResponse struct {
         RPC string `json:"rpc_service"`
	DASService    string `json:"das_service"`
	HeaderSyncing string             `json:"sync_service"`
}

So basically call all of these things services (we also don't need share service bc das covers that, and share service is kind of an under the hood thing). We also don't need state service "health" bc those endpoints are accessible via RPC always as long as the core connection has been provided. It's really only a proxy rather than a service. The only time endpoints are limited is if a fraud proof has been received, in which case state update endpoints are blocked, but state reads are fine.

Bool would be fine but I think it's better to have running || stopped strings bc asking for /health and getting true || false is not descriptive.

Wdyt about this?

@renaynay
Copy link
Member

It would also be nice to have a list of available endpoints on the node the way that tendermint does: https://rpc-mamaki.pops.one/

@distractedm1nd
Copy link
Contributor Author

Yes, makes sense - two things:

  1. Should we maybe put this in a /status endpoint, and keep /health as a {"status": "ok"}? I noticed that the rpc at your link does this
  2. RPC would always return running for now, correct? Since it has to be running to respond to the endpoint

@renaynay
Copy link
Member

@distractedm1nd yes to both 1 and 2

Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am good with this, however:

  • Share and State services running or not give node operator zero info. Starting and stopping happen only at the start and stop of the node. Besides State service which can be stopped in runtime with fraud proofs case, but more on this later.
    • Also, why only those services, and why not Fraud?
  • Call this status endpoint rather than health (extending on the point above)
    • status is more about data being returned and implies that the node operator himself understands if the node is running healthy based on the returned data,
    • A health check implies some additional "smart" logic that evaluates node health itself and only returns a simple true and false. In our case, this logic could be a combination of:
      • Checking that header are constantly synced
      • Checking that header are constantly sampled
      • Checking if any FraudProof exist/received
    • health might be extracted as a separate issue

Note that this will need to be reworked with newer PublicAPI over RPC, so I am fine with not making this super clean and changes are only requested for the comments below.

Thank you!

StateService bool `json:"state_service"`
ShareService bool `json:"share_service"`
DASStatus DasStateResponse `json:"das_status"`
HeaderSyncing bool `json:"header_syncing"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should return the Syncer state here as well

Comment on lines +27 to +30
dasState := new(DasStateResponse)
dasState.SampleRoutine = h.das.SampleRoutineState()
dasState.CatchUpRoutine = h.das.CatchUpRoutineState()
availResp.DASStatus = *dasState
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to allocate response and then copy it again

Suggested change
dasState := new(DasStateResponse)
dasState.SampleRoutine = h.das.SampleRoutineState()
dasState.CatchUpRoutine = h.das.CatchUpRoutineState()
availResp.DASStatus = *dasState
availResp.DASStatus = &DasStateResponse{
SampleRoutine: h.das.SampleRoutineState(),
CatchUpRoutine: h.das.CatchUpRoutineState(),
}

@adlerjohn adlerjohn marked this pull request as draft July 25, 2022 02:43
@distractedm1nd
Copy link
Contributor Author

Not worth pursuing until the refactorings, so removing from the roadmap for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rpc kind:feat Attached to feature PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request]: /health RPC endpoint
5 participants