Skip to content

Improve retryable monitoring: Notion sync + Slack alerts#49

Closed
mahsamoosavi wants to merge 28 commits intomainfrom
improve-retryable-monitor
Closed

Improve retryable monitoring: Notion sync + Slack alerts#49
mahsamoosavi wants to merge 28 commits intomainfrom
improve-retryable-monitor

Conversation

@mahsamoosavi
Copy link
Contributor

@mahsamoosavi mahsamoosavi commented Apr 9, 2025

This PR introduces a new Notion + Slack flow for improved retryable ticket monitoring, triage preservation, and alerting (closes INT-94)

1. Notion Sync with Status Preservation (--writeToNotion)

  • Each detected retryable ticket is synced to a Notion database.

  • If the ticket does not exist, a new row is created with:

       - Status set to "Untriaged"
    
       - All relevant metadata (gas info, tokens deposited, callvalue, etc.)
    
  • If the ticket already exists:

        - Metadata is refreshed
    
        - Status is preserved unless still "Untriaged" or blank
         → This protects manual triage like "Investigating", "Resolved", etc.
    

2. Slack Alerts for Expiring Retryables (--enableAlerting)

  • Every 24h, the monitor sweeps the Notion DB.

  • For each ticket:

        - If it is within 2 days of expiry
    
        - And its Status is "Untriaged" or "Investigating"
         → A Slack alert is triggered
    

Tickets marked "Resolved" or "Expired" are ignored.

3. Automatic Expiry Marking

  • During the same 24h sweep, tickets that are:

       - Older than 7 days
    
       - Not already marked "Resolved" or "Expired"
       → Are automatically marked "Expired"
    

This helps keep the Notion database clean and avoids stale tickets lingering.

config.json Outdated
"orbitRpcUrl": "https://sepolia-rollup.arbitrum.io/rpc",
"explorerUrl": "https://sepolia.arbiscan.io",
"parentExplorerUrl": "https://sepolia.etherscan.io",
"parentRpcUrl": "https://mainnet.infura.io/v3/ccf7fbdf6b99420ab8e3b072f6c4ed1e",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to commit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO :((
Will reset the key :'(

} catch (error) {
console.error(
`Error getting the latest block: ${(error as Error).message}`
const currentBlock = await parentChainProvider.getBlockNumber()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be removing the error handling?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I put the try/catch back around getBlockNumber() to match what's currently in main. Should help avoid any unexpected crashes if the RPC call fails.

}
const status = await retryableMessage.status()
const notionStatus = 'Untriaged'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure that we dont want to check status before doing all of this work and writing it to the database?

do we want to write every retryable to the db?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We dont! I had left it that way while testing and just forgot to add the check. Will update it so it only writes unredeemed tickets to the db.

// Continuously check retryables with periodic sweeps

// Function to continuously check retryable transactions
const checkRetryablesContinuous = async (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't need to change this function.

we run the monitoring as individual processes in action runners. it isn't a long running process.

I might suggest that it would be better to just remove this entirely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I’d lean toward keeping it since part of the goal was to let RaaSes or other teams run this as a monitor outside GH Actions too, it might be useful for longer-running setups. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's simple enough to wrap the process in a cron system of their choice.

I don't have a super strong opinion either way. It is nice to maintain less code. 😄

parentRpcUrl: childChain.parentRpcUrl,
}))
)
const checkNotionForExpiringRetryables = async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could move this into our new notion folder

Copy link
Contributor

@TucksonDev TucksonDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm halfway through the index file, but I prefer to stop the review here to avoid leaving a big amount of comments 😅 .

Besides a few nitpicks and potential issues, in general I feel like this PR does a huge amount of refactoring that could be done in a different PR, since the refactoring is not related to the changes that we want to add to this PR.

Also, many comments have been removed that seem useful for future contributors. I'd commented on some of the removed comments but haven't reviewed all of them, so my feedback is that no comments should be removed unless the logic that they are describing changes.

In general, this PR seems to be adding/removing unrelated comments and refactoring code unrelated to this new functionality. I'd prefer to include only changes that are relevant to the new functionality to focus only on that, and have the refactoring done in a different PR 🙏

Happy to discuss further on Slack if needed!

erc20.decimals(),
])

const nativeTokenAmount = ethers.utils.formatUnits(ticket.deposit, decimals)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The console.log below suggests that this is the l2callvalue defined in the ticket, but you're using the deposit here (maxSubmission + gasLimit*gasPriceBid)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took this from the original monitor but fully agreed the label’s a bit misleading. I’ll update it to say total deposit instead of callvalue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'll cc @dewanshparashar here to follow up on the original monitor (which I guess has the same text)

Copy link
Contributor

@TucksonDev TucksonDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mahsamoosavi ! I've reviewed only the core logic of the changes related to Notion, I'll leave the rest of the code to the FS team.

Requesting a few changes here and there. Happy to discuss any of those!

Copy link
Contributor

@TucksonDev TucksonDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mahsamoosavi !

Resolved all old and solved comments, and left follow ups on 2 of them. I'll approve once we solved those 💪

@mahsamoosavi
Copy link
Contributor Author

Thanks @mahsamoosavi !

Resolved all old and solved comments, and left follow ups on 2 of them. I'll approve once we solved those 💪

Thanks @TucksonDev! Addresses them above, please let me know if the adjustments look good

Copy link
Contributor

@TucksonDev TucksonDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @mahsamoosavi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants