Skip to content

Commit 24729f9

Browse files
authored
WIP Smart worker check for open window + persistent error mgmt (#66)
<!-- < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < < ☺ v ✰ Thanks for creating a PR! You're awesome! ✰ v Please note that maintainers will only review those PRs with a completed PR template. ☺ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --> ## Purpose of Changes and their Description Smart window detection, reducing the need queries for open nonces * Refactored mechanism to use same for workers and reputers via interface. * Fair offset calculation within window * Configurable and generic `QueryDataWithRetry`, to further reduce the chance of missing an epoch * Added parameters to control the mechanism, explained in README Other fixes * Account Sequence and Fees error mgmt is now robust - there's persistence of errors. * Loglevels - adding Trace. * Use of retries in all queries for robustness. * More robust registration and staking, using confirmation of registration and stake before going forward (otherwise the actor will just get its tx continuously rejected) * Fix panic when account on the keyring suddenly vanishes (eg cases of key removal or rebuilding keyring) * Added GetTopic query and others ## Link(s) to Ticket(s) or Issue(s) resolved by this PR ## Are these changes tested and documented? - [X] If tested, please describe how. If not, why tests are not needed. -- Tested against testnet, also added further unit tests - [X] If documented, please describe where. If not, describe why docs are not needed. -- On README. Added explanation and text-based visualization. - [x] Added to `Unreleased` section of `CHANGELOG.md`?
2 parents 21e4c19 + f0c7116 commit 24729f9

21 files changed

+982
-146
lines changed

CHANGELOG.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,17 +41,18 @@ All notable changes to this project will be documented in this file.
4141
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
4242
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html) for all versions `v1.0.0` and beyond (still considered experimental prior to v1.0.0).
4343

44-
## [Unreleased]
44+
## v0.6.0
4545

4646
### Added
47+
* [#66](https://github.com/allora-network/allora-offchain-node/pull/66) Smart worker detection of submission windows + persistent error management + query retrials + reg/stake robustness + improved logging
48+
* [#81](https://github.com/allora-network/allora-offchain-node/pull/81) Timeout height handling on tx submission
4749

4850
### Removed
4951

5052
### Fixed
5153

5254
### Security
5355

54-
5556
## v0.5.1
5657

5758
### Added

README.md

Lines changed: 66 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ from the root directory. This will:
2525
5. Run `docker compose up --build`. This will:
2626
- Run the both the offchain node and the source services, communicating through endpoints attached to the internal dns
2727

28-
Please note that the environment variable will be created as bumdle of your config.json and allora account secrets, please make sure to remove every sectrets before commiting to remote git repository
28+
Please note that the environment variable will be created as bumdle of your config.json and allora account secrets, please make sure to remove every secrets before commiting to remote git repository
2929

3030

3131
## How to run without docker
@@ -69,6 +69,64 @@ Some metrics has been provided for in the node. You can access them with port `:
6969

7070
> Please note that we will keep updating the list as more metrics are being added
7171
72+
## Cycle Example
73+
74+
Visualization of timeline and submission windows for an actor, in this case, a worker.
75+
Reputers have submission windows too, but they are fixed to be 1 full topic's epoch length.
76+
77+
Example configuration:
78+
* Topic epoch length is 100 blocks
79+
* Topic submission window is 10 blocks
80+
* Near zone is 2 submission windows (20 blocks)
81+
82+
```
83+
Epoch N Epoch N+1
84+
|---------|----------------------------------|-----------------------------------|--------→
85+
Block: 1000 1100 1200
86+
↑ ↑ ↑
87+
Epoch Start Epoch End Next Epoch End
88+
(& Submission & Next Epoch Start
89+
Window Start) (& Next Submission
90+
Window Start)
91+
92+
Detailed View of Zones (assuming epoch length = 100 blocks):
93+
94+
Block 1000 Block 1010 Block 1080 Block 1100
95+
|-----------|--------------------------|--------------|
96+
|← Far Zone →|← Near Zone →|
97+
|← SW →|
98+
(SW = Submission Window)
99+
100+
Zone Breakdown (example numbers):
101+
• Epoch Length: 100 blocks
102+
• Submission Window: 100 blocks (coincides with epoch)
103+
• Near Zone: Last 20 blocks (NUM_SUBMISSION_WINDOWS_FOR_SUBMISSION_NEARNESS * WorkerSubmissionWindow)
104+
105+
-------------------------------
106+
- Full cycle transition points
107+
- Block 1000: Epoch N starts & Submission Window opens
108+
- Block 1010: Submission Window closes. Waiting for next window, typically from far zone.
109+
- Block 1080: Enters Near Zone (more frequent checks)
110+
- Block 1100: Epoch N ends & Epoch N+1 starts
111+
112+
```
113+
114+
### Notes
115+
116+
- Submissions
117+
- Submissions are accepted within the submission window
118+
- Submission window opens at epoch start
119+
- Waiting Zone Behavior
120+
- The behaviour of the node when waiting for the submission window depends on its nearness to the submission window to reduce likelihood of missing a window.
121+
- Far Zone: Longer intervals between checks, optimized for efficiency
122+
- This is controlled by `blockDurationEstimated` and `windowCorrectionFactor`
123+
- Near Zone: More frequent checks with randomization for fair participation
124+
- Submissions are separated - they must happen within the submission window
125+
126+
### Random offset
127+
128+
The node introduces a random offset to the submission time to avoid the thundering herd problem alleviating mempool congestion.
129+
72130
## How to configure
73131

74132
There are several ways to configure the node. In order of preference, you can do any of these:
@@ -106,6 +164,13 @@ Note: when an account sequence mismatch is detected, the node will attempt to se
106164
- `retryDelay`: For all other errors that need retry delays.
107165

108166

167+
### Smart Window Detection
168+
169+
The node will automatically detect the submission window length for each topic on each actor type.
170+
This can be configured by the following settings in the config.json:
171+
* `blockDurationEstimated`: Estimated network block time in seconds. Minimum is 1.
172+
* `windowCorrectionFactor`: Correction factor to fine-tune the submission window length. Higher values optimize the number of calls for window checking. Minimum is 0.5.
173+
109174
## Configuration examples
110175

111176
A complete example is provided in `config.example.json`.

config.cdk.json.template

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,20 @@
55
"alloraHomeDir": "_ALLORA_WALLET_HOME_DIR_",
66
"gas": "_ALLORA_WALLET_GAS_",
77
"gasAdjustment": _ALLORA_WALLET_GAS_ADJUSTMENT_,
8+
"gasPrices": _ALLORA_WALLET_GAS_PRICES_,
9+
"maxFees": _ALLORA_WALLET_MAX_FEES_,
810
"nodeRpc": "_ALLORA_WALLET_NODE_RPC_",
911
"maxRetries": _ALLORA_WALLET_MAX_RETRIES_,
10-
"delay": _ALLORA_WALLET_DELAY_,
11-
"submitTx": _ALLORA_WALLET_SUBMIT_TX_
12+
"retryDelay": _ALLORA_WALLET_RETRY_DELAY_,
13+
"accountSequenceRetryDelay": _ALLORA_WALLET_ACCOUNT_SEQUENCE_RETRY_DELAY_,
14+
"submitTx": _ALLORA_WALLET_SUBMIT_TX_,
15+
"blockDurationEstimated": _ALLORA_WALLET_BLOCK_DURATION_ESTIMATED_,
16+
"windowCorrectionFactor": _ALLORA_WALLET_WINDOW_CORRECTION_FACTOR_
1217
},
1318
"worker": [
1419
{
1520
"topicId": _ALLORA_WORKER_TOPIC_ID_,
1621
"inferenceEntrypointName": "_ALLORA_WORKER_INFERENCE_ENTRYPOINT_NAME_",
17-
"loopSeconds": _ALLORA_WORKER_LOOP_SECONDS_,
1822
"parameters": {
1923
"InferenceEndpoint": "_ALLORA_WORKER_INFERENCE_ENDPOINT_",
2024
"Token": "_ALLORA_WORKER_TOKEN_"
@@ -26,7 +30,6 @@
2630
"topicId": _ALLORA_REPUTER_TOPIC_ID_,
2731
"groundTruthEntrypointName": "_ALLORA_REPUTER_ENTRYPOINT_NAME_",
2832
"lossFunctionEntrypointName": "_ALLORA_REPUTER_ENTRYPOINT_NAME_",
29-
"loopSeconds": _ALLORA_REPUTER_LOOP_SECONDS_,
3033
"minStake": _ALLORA_REPUTER_MIN_STAKE_,
3134
"groundTruthParameters": {
3235
"GroundTruthEndpoint": "_ALLORA_REPUTER_SOURCE_OF_TRUTH_ENDPOINT_",

config.example.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,14 @@
1111
"maxRetries": 5,
1212
"retryDelay": 3,
1313
"accountSequenceRetryDelay": 5,
14-
"submitTx": true
14+
"submitTx": true,
15+
"blockDurationEstimated": 10,
16+
"windowCorrectionFactor": 0.8
1517
},
1618
"worker": [
1719
{
1820
"topicId": 1,
1921
"inferenceEntrypointName": "api-worker-reputer",
20-
"loopSeconds": 10,
2122
"parameters": {
2223
"InferenceEndpoint": "http://source:8000/inference/{Token}",
2324
"Token": "ETH"
@@ -29,7 +30,6 @@
2930
"topicId": 1,
3031
"groundTruthEntrypointName": "api-worker-reputer",
3132
"lossFunctionEntrypointName": "api-worker-reputer",
32-
"loopSeconds": 30,
3333
"minStake": 100000,
3434
"groundTruthParameters": {
3535
"GroundTruthEndpoint": "http://localhost:8888/gt/{Token}/{BlockHeight}",

lib/domain_config.go

Lines changed: 76 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,20 @@
11
package lib
22

33
import (
4+
"errors"
5+
"fmt"
6+
47
emissions "github.com/allora-network/allora-chain/x/emissions/types"
58
bank "github.com/cosmos/cosmos-sdk/x/bank/types"
69
"github.com/ignite/cli/v28/ignite/pkg/cosmosaccount"
710
"github.com/ignite/cli/v28/ignite/pkg/cosmosclient"
8-
"github.com/rs/zerolog/log"
11+
)
12+
13+
const (
14+
WindowCorrectionFactorSuggestedMin = 0.5
15+
BlockDurationEstimatedMin = 1.0
16+
RetryDelayMin = 1
17+
AccountSequenceRetryDelayMin = 1
918
)
1019

1120
// Properties manually provided by the user as part of UserConfig
@@ -23,6 +32,8 @@ type WalletConfig struct {
2332
RetryDelay int64 // number of seconds to wait between retries (general case)
2433
AccountSequenceRetryDelay int64 // number of seconds to wait between retries in case of account sequence error
2534
SubmitTx bool // useful for dev/testing. set to false to run in dry-run processes without committing to the chain
35+
BlockDurationEstimated float64 // estimated average block duration in seconds
36+
WindowCorrectionFactor float64 // correction factor for the time estimation, suggested range 0.7-0.9.
2637
}
2738

2839
// Properties auto-generated based on what the user has provided in WalletConfig fields of UserConfig
@@ -36,16 +47,24 @@ type ChainConfig struct {
3647
AddressPrefix string // prefix for the allora addresses
3748
}
3849

50+
type TopicActor interface {
51+
GetTopicId() emissions.TopicId
52+
}
53+
3954
type WorkerConfig struct {
4055
TopicId emissions.TopicId
4156
InferenceEntrypointName string
4257
InferenceEntrypoint AlloraAdapter
4358
ForecastEntrypointName string
44-
ForecastEntrypoint AlloraAdapter
45-
LoopSeconds int64 // seconds to wait between attempts to get next worker nonce
59+
ForecastEntrypoint AlloraAdapter // seconds to wait between attempts to get next worker nonce
4660
Parameters map[string]string // Map for variable configuration values
4761
}
4862

63+
// Implement TopicActor interface for WorkerConfig
64+
func (w WorkerConfig) GetTopicId() emissions.TopicId {
65+
return w.TopicId
66+
}
67+
4968
type ReputerConfig struct {
5069
TopicId emissions.TopicId
5170
GroundTruthEntrypointName string
@@ -57,11 +76,15 @@ type ReputerConfig struct {
5776
// This is idempotent in that it will not add more stake than specified here.
5877
// Set to 0 to effectively disable this feature and use whatever stake has already been added.
5978
MinStake int64
60-
LoopSeconds int64 // seconds to wait between attempts to get next reptuer nonces
6179
GroundTruthParameters map[string]string // Map for variable configuration values
6280
LossFunctionParameters LossFunctionParameters // Map for variable configuration values
6381
}
6482

83+
// Implement TopicActor interface for ReputerConfig
84+
func (r ReputerConfig) GetTopicId() emissions.TopicId {
85+
return r.TopicId
86+
}
87+
6588
type LossFunctionParameters struct {
6689
LossFunctionService string
6790
LossMethodOptions map[string]string
@@ -105,19 +128,59 @@ type ValueBundle struct {
105128

106129
// Check that each assigned entrypoint in the user config actually can be used
107130
// for the intended purpose, else throw error
108-
func (c *UserConfig) ValidateConfigAdapters() {
131+
func (c *UserConfig) ValidateConfigAdapters() error {
132+
// Validate wallet config
133+
err := c.ValidateWalletConfig()
134+
if err != nil {
135+
return err
136+
}
137+
// Validate worker configs
109138
for _, workerConfig := range c.Worker {
110-
if workerConfig.InferenceEntrypoint != nil && !workerConfig.InferenceEntrypoint.CanInfer() {
111-
log.Fatal().Interface("entrypoint", workerConfig.InferenceEntrypoint).Msg("Invalid inference entrypoint")
112-
}
113-
if workerConfig.ForecastEntrypoint != nil && !workerConfig.ForecastEntrypoint.CanForecast() {
114-
log.Fatal().Interface("entrypoint", workerConfig.ForecastEntrypoint).Msg("Invalid forecast entrypoint")
139+
err := workerConfig.ValidateWorkerConfig()
140+
if err != nil {
141+
return err
115142
}
116143
}
117-
144+
// Validate reputer configs
118145
for _, reputerConfig := range c.Reputer {
119-
if reputerConfig.GroundTruthEntrypoint != nil && !reputerConfig.GroundTruthEntrypoint.CanSourceGroundTruthAndComputeLoss() {
120-
log.Fatal().Interface("entrypoint", reputerConfig.GroundTruthEntrypoint).Msg("Invalid loss entrypoint")
146+
err := reputerConfig.ValidateReputerConfig()
147+
if err != nil {
148+
return err
121149
}
122150
}
151+
return nil
152+
}
153+
154+
func (c *UserConfig) ValidateWalletConfig() error {
155+
if c.Wallet.WindowCorrectionFactor < WindowCorrectionFactorSuggestedMin {
156+
return errors.New(fmt.Sprintf("window correction factor lower than suggested minimum: %f < %f", c.Wallet.WindowCorrectionFactor, WindowCorrectionFactorSuggestedMin))
157+
}
158+
if c.Wallet.BlockDurationEstimated < BlockDurationEstimatedMin {
159+
return errors.New(fmt.Sprintf("block duration estimated lower than the minimum: %f < %f", c.Wallet.BlockDurationEstimated, BlockDurationEstimatedMin))
160+
}
161+
if c.Wallet.RetryDelay < RetryDelayMin {
162+
return errors.New(fmt.Sprintf("retry delay lower than the minimum: %d < %d", c.Wallet.RetryDelay, RetryDelayMin))
163+
}
164+
if c.Wallet.AccountSequenceRetryDelay < AccountSequenceRetryDelayMin {
165+
return errors.New(fmt.Sprintf("account sequence retry delay lower than the minimum: %d < %d", c.Wallet.AccountSequenceRetryDelay, AccountSequenceRetryDelayMin))
166+
}
167+
168+
return nil
169+
}
170+
171+
func (reputerConfig *ReputerConfig) ValidateReputerConfig() error {
172+
if reputerConfig.GroundTruthEntrypoint != nil && !reputerConfig.GroundTruthEntrypoint.CanSourceGroundTruthAndComputeLoss() {
173+
return errors.New("invalid loss entrypoint")
174+
}
175+
return nil
176+
}
177+
178+
func (workerConfig *WorkerConfig) ValidateWorkerConfig() error {
179+
if workerConfig.InferenceEntrypoint != nil && !workerConfig.InferenceEntrypoint.CanInfer() {
180+
return errors.New("invalid inference entrypoint")
181+
}
182+
if workerConfig.ForecastEntrypoint != nil && !workerConfig.ForecastEntrypoint.CanForecast() {
183+
return errors.New("invalid forecast entrypoint")
184+
}
185+
return nil
123186
}

lib/repo_query_balance.go

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,32 @@ package lib
22

33
import (
44
"context"
5+
"time"
56

67
cosmossdk_io_math "cosmossdk.io/math"
8+
"github.com/cosmos/cosmos-sdk/types/query"
79
banktypes "github.com/cosmos/cosmos-sdk/x/bank/types"
810
)
911

1012
func (node *NodeConfig) GetBalance() (cosmossdk_io_math.Int, error) {
1113
ctx := context.Background()
12-
resp, err := node.Chain.BankQueryClient.Balance(ctx, &banktypes.QueryBalanceRequest{
13-
Address: node.Chain.Address,
14-
Denom: node.Chain.DefaultBondDenom,
15-
})
14+
15+
resp, err := QueryDataWithRetry(
16+
ctx,
17+
node.Wallet.MaxRetries,
18+
time.Duration(node.Wallet.RetryDelay)*time.Second,
19+
func(ctx context.Context, req query.PageRequest) (*banktypes.QueryBalanceResponse, error) {
20+
return node.Chain.BankQueryClient.Balance(ctx, &banktypes.QueryBalanceRequest{
21+
Address: node.Chain.Address,
22+
Denom: node.Chain.DefaultBondDenom,
23+
})
24+
},
25+
query.PageRequest{},
26+
"get balance",
27+
)
1628
if err != nil {
1729
return cosmossdk_io_math.Int{}, err
1830
}
31+
1932
return resp.Balance.Amount, nil
2033
}

lib/repo_query_block.go

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -2,30 +2,31 @@ package lib
22

33
import (
44
"context"
5-
"encoding/json"
5+
"time"
66

77
emissionstypes "github.com/allora-network/allora-chain/x/emissions/types"
8-
"github.com/rs/zerolog/log"
8+
"github.com/cosmos/cosmos-sdk/types/query"
99
)
1010

1111
func (node *NodeConfig) GetReputerValuesAtBlock(topicId emissionstypes.TopicId, nonce BlockHeight) (*emissionstypes.ValueBundle, error) {
1212
ctx := context.Background()
1313

14-
req := &emissionstypes.GetNetworkInferencesAtBlockRequest{
15-
TopicId: topicId,
16-
BlockHeightLastInference: nonce,
17-
}
18-
reqJSON, err := json.Marshal(req)
19-
if err != nil {
20-
log.Error().Err(err).Msg("Error marshaling GetNetworkInferencesAtBlockRequest to print Msg as JSON")
21-
} else {
22-
log.Info().Str("req", string(reqJSON)).Msg("Getting GetNetworkInferencesAtBlockRequest from chain")
23-
}
24-
25-
res, err := node.Chain.EmissionsQueryClient.GetNetworkInferencesAtBlock(ctx, req)
14+
resp, err := QueryDataWithRetry(
15+
ctx,
16+
node.Wallet.MaxRetries,
17+
time.Duration(node.Wallet.RetryDelay)*time.Second,
18+
func(ctx context.Context, req query.PageRequest) (*emissionstypes.GetNetworkInferencesAtBlockResponse, error) {
19+
return node.Chain.EmissionsQueryClient.GetNetworkInferencesAtBlock(ctx, &emissionstypes.GetNetworkInferencesAtBlockRequest{
20+
TopicId: topicId,
21+
BlockHeightLastInference: nonce,
22+
})
23+
},
24+
query.PageRequest{},
25+
"get reputer values at block",
26+
)
2627
if err != nil {
2728
return &emissionstypes.ValueBundle{}, err
2829
}
2930

30-
return res.NetworkInferences, nil
31+
return resp.NetworkInferences, nil
3132
}

0 commit comments

Comments
 (0)