F3 passive testing #12287
Replies: 22 comments 1 reply
-
Note that the f3 implementation team will not deploy anything that initiates f3 in mainnet environment until the vast majority of the network has stabilized from the network upgrade. The current plan is to bootstrap the initial f3 instance among a small amount of the nodes on Aug 7th/8th (>24 hours after the upgrade), we will monitor the network and adjust the deployment plan accordingly, and keep you updated on what to expect in this discussion! |
Beta Was this translation helpful? Give feedback.
-
🚀F3 Passive Testing Kick offAug 7th, 2024 We're excited to announce the soft launch of Fast Finality for Filecoin (F3) as part of FIP-0086! 🏎️If you've been following our progress, you might already know that this has been in the works for some time. Fast Finality aims to dramatically reduce the finality time from 900 epochs to just around 2, increasing the speed by 450X and ensuring that "finality" truly means finality. This means transactions will be completed reliably and quickly. With the mandatory release of Lotus v1.28.1 and above, we're now able to conduct passive testing on the mainnet. This testing is a critical step toward fully implementing Fast Finality in the upcoming network upgrade (nv24) later this year. After the full rollout, we expect significant UX improvements for both token holders and dApp users. Additionally, F3 will enable the creation of trustless clients and bridges that can send messages over the network and verify finality without needing to run a full node, making it both efficient and cost-effective. To learn more about Fast Finality and its implications for bridging to other networks, check out this talk. We're thrilled to take this step forward and look forward to your feedback as we move closer to full implementation. F3 Passive TestingF3 has undergone extensive testing in both code simulation and on the Butterfly testnet. However, to ensure it's ready for mainnet production, we need to perform large-scale testing that mirrors the number of nodes on the mainnet. This will help verify the performance of F3, GPBFT, and Lotus integration. Our goal is to execute this testing as passively as possible, minimizing disruptions. The F3 engineering team plans to start with a small group of F3 nodes to confirm everything works as expected. We will gradually increase the number of F3 participants, aiming to reach a scale close to mainnet production. Why Passive Testing?The transition to F3 marks a significant advancement for Filecoin and its ecosystem, offering faster transaction speeds and enabling trustless bridges. However, this evolution introduces complexities in testing, verification, and ensuring confidence in the system's implementation across the network. Passive testing is essential for several reasons:
Passive Testing allows the F3 engineering team to test a full-scale rollout on a real network without affecting consensus. This approach ensures that the testing does not disrupt existing network operations. Testing Plan SummaryA more thorough testing plan is laid out in this issue, in a nutshell:
It's important to note that none of the above affects the Expected Consensus (EC) in nv23. F3 runs in the background and finalizes tipsets independently from EC for testing purposes, hence the term “passive.” Observable Metrics for MonitoringThe F3 team has implemented several metrics to monitor node and consensus performance:
We will share updates and results with the community at least bi-weekly, if not more frequently. You can also monitor F3 metrics via the Prometheus debug metrics endpoint at Initial F3 DeploymentThe F3 team is excited to announce the initial deployment of Fast Finality (F3) starting on August 7th. We will begin with 50 randomly selected participants, including some of our dedicated alpha testers from the community 💙. Once the deployment begins, you will start seeing F3 logs similar to the following:
This log indicates that F3 has successfully finalized a tipset. Adjusting Log LevelsIf you wish to adjust the log level for F3 in Lotus or Lotus Miner, you can do so by running the following commands:
Monitoring and ReportingThe F3 engineering team will be closely monitoring all participating nodes. If you notice any irregular behavior on your nodes, please report it in the Mainnet VerificationsBelow is a list of verifications that will be performed on the mainnet during the initial F3 deployment:
We look forward to this exciting phase and appreciate your support and feedback as we work towards fully implementing F3 across the network. 🚢 Launch Time!Fast Finality has been one of the most anticipated features for Filecoin participants since the network's inception. We are thrilled to collaborate with YOU to finally bring this feature to the Filecoin mainnet by the end of the year! We want to extend our gratitude to all community members participating in this testing phase. Special thanks go to our early testers for their invaluable support with Slack handles: TippyFlits, Reiers, stuberman, beck, and @marco-storswift. Stay tuned for updates in the Happy finalizing fast! 🏎️💨🏁 |
Beta Was this translation helpful? Give feedback.
-
Testing update - Aug 14thIn the past week..
🐛 Bandwidth Usage SpikeLotus node operators have reported irregular traffic spike starting Wednesday morning and we have confirmed it is caused by F3 implementation. The excessive F3 bandwidth was caused by a routing loop in pubsub. The “manifest” server (used to facilitate testing) broadcast a small message every 20 seconds, which shouldn’t have introduced an excessive load. Unfortunately, pubsub’s routing loop prevention mechanism appears to have been ineffective in this case so each message cycled around the network over and over. We’ve fixed this issue by:
Note that even tho the bandwidth usage on nodes was increased, we believe that does not impact/cause issue towards node synchronization. However, we would also like to avoid any unexpected node performance degradation as much as possible. Therefore, we will release a lotus patch in NA working hours (The fix is already available here #12390, we will merge it after more testing). Next round..Next week, we aim to scale our testing to 500-1000 nodes. We will gradually increase the number of nodes participating in F3 and continue monitoring their performance throughout the process. One piece of good news is that we haven't encountered any issues so far that impact node synchronization or block production, and we will remain conservative with our testing to avoid any potential problems. If you notice anything irregular, please don't hesitate to reach out to us in #fil-fast-finality! |
Beta Was this translation helpful? Give feedback.
-
F3 (Fast Finality) passive testing update - 2024-09-05🗂 F3 Readiness Review and Timeline AdjustmentWith the proposed nv24 upgrade date rapidly approaching, we conducted a thorough review of the tasks necessary to complete our "Hardening and Mainnet Deployment Readiness" milestone. Our assessment revealed a time deficit of approximately three weeks. In light of this finding, we've taken two important steps:
This time extension is important for completing our "Hardening and Mainnet Deployment Readiness" milestone, ensuring we address all critical items in our backlog. The additional time will help us maintain quality and minimize risks as we approach this significant network upgrade. ⏮️ Since the last update: Progress over the past weeksLet's dive into the specifics of our recent hardening, fixes and testing we have done in the past weeks: Hardening and Fixes:
Testing Efforts:
⏭️ Upcoming Week's Focus:Hardening and Fixes:
Testing Efforts:
Stay tuned for more updates in the #fil-fast-finality channel as we continue to harden F3 and move closer to the nv24 rollout! 💪 |
Beta Was this translation helpful? Give feedback.
-
F3 (Fast Finality) passive testing update - 2024-09-15Hey all! 👋 Another week has passed, so here is another weekly update on the F3 passive testing and hardening efforts in preparation for the mainnet launch. As mentioned in last week's update, the team requested a time adjustment for the nv24 timeline which would allow us to complete all tasks in our "Hardening and Mainnet Deployment Readiness" milestone, and complete critical items in our backlog. This timeline adjustment has now been accepted, and the the nv24 timeline and F3 rollout looks like this now:
⏮️ Since the last update: Progress over the past weekHardening and Fixes:
Testing Efforts:
⏭️ Upcoming Week's Focus:Hardening and Fixes:
Testing Efforts:
Next week the F3 team will be co-locating. This focused sprint aims to complete the remaining tasks in Milestone 2: "Hardening and Mainnet Deployment Readiness". By working side-by-side, we expect to accelerate our progress, and ensure we're fully prepared for the upcoming mainnet deployment 🚀. Additionally, we are working on preparing an operators’ guide to F3 to help get the community ready for the F3 launch. This guide will cover key topics such as setup, configuration, and best practices. We aim to have it published by the end of next week. Stay tuned for more updates in the #fil-fast-finality channel! |
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here are some quick updates on the passive testing efforts: ⏮️ On Friday ( The F3 team resumed work on the passive testing tools after the nv24 upgrade was completed. All node operators have now upgraded to the latest go-f3 version (v0.7.2) which includes the latest bug fixes and enhancements. After addressing some infrastructure tasks, the team deployed F3 passive testing to around 5 MinerIDs. The testing ran smoothly over the weekend without any issues. ⏮️ On Monday ( We began increasing the number of participants in the passive testing, starting with around 100 MinerIDs, which was successful, and let it run for about 1 hour. We then increased to 200 MinerIDs, which also successfully bootstrapped and let it run for a couple of hours. During the 200 MinerID testing round, we observed a CPU spike on our observer node. The CPU profile dump indicated that the SplitStore running compaction caused the spike. Testing was paused around 20:00 UTC to allow the team to rest. Additionally, we created redirects for our public F3 Grafana dashboards to make them easier to find. These redirects use a 308 status code, allowing us to change the backend URL without breaking previously announced links: - Calibnet: https://grafana.f3.eng.filoz.org/public/calibnet ⏭️ Todays plan ( We plan to investigate data from yesterday's testing rounds, focusing on fluctuations between different senders during various phases and rounds on our observer node. After analyzing yesterday's data, we aim to run a test with 600 MinerIDs (approximately 30% of the network). 📣Other noteworthy callouts:
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Quick EOD update: ⏮️ What happened today (2024-11-26)
🟡 Known open issues (as of EOD 2024-11-26)
⏭️ Plan for tomorrow (2024-11-27)
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Quick update: ⏮️ What happened today (2024-11-27)
🟡 Known open issues (as of EOD 2024-11-27)
⏭️ Plan for tomorrow (2024-11-28)
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Update from todays passive testing round: ⏮️ What happened today (2024-11-28)
🟡 Known open issues (as of EOD 2024-11-28)
⏭️ Plan for tomorrow (2024-11-29)
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Update from Fridays passive testing round: ⏮️ What happened today (2024-11-29)
So to end of todays, and the weeks update with some significant achievements of the week:
🟡 Known open issues (as of EOD 2024-11-29)
![]()
|
Beta Was this translation helpful? Give feedback.
-
Hey all! 👋 Here is an update from todays passive testing round: ⏮️ What happened today (2024-12-02)
Some of the wins for todays passive testing:
🟡 Known open issues (as of EOD 2024-12-02)
⏭️ Plan for tomorrow (2024-12-03)
|
Beta Was this translation helpful? Give feedback.
-
Hey all! 👋 Here is an update from todays passive testing round: ⏮️ What happened today (2024-12-03)
⏭️ Plan for tomorrow (2024-12-04)
|
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Hey! 👋 Here is the update from todays passive testing rounds: ⏮️ What happened today (2024-12-06)
🟡 Known open issues (as of EOD 2024-12-05)
⏭️ Plan for tomorrow (2024-12-06)
|
Beta Was this translation helpful? Give feedback.
-
Hey! 👋 Here the update from Fridays (2024-11-06) passive testing rounds: ⏮️ What happened today (2024-12-06)
🟡 Known open issues (as of EOD 2024-12-06)
⏭️ Plan for monday (2024-12-09)
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here is an update from todays (2024-11-09) passive testing rounds: ⏮️ What happened today (2024-12-09)
🟡 Known open issues (as of EOD 2024-12-09)
⏭️ Plan for monday (2024-12-10)
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here is an update from todays (2024-12-10) passive testing rounds: ⏮️ What happened today (2024-12-10)
🟡 Known open issues (as of EOD 2024-12-10)
⏭️ Plan for Wednesday (2024-12-11)
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 The first rounds of passive testing after the nv25 upgrade kicked off yesterday (2025-04-15). Here is a quick summary of the plans going forward, and the results from the day: ⏮️ What happened on We started with round 49 at 20% scale (about 310 nodes) with default F3 configuration and no power override. We quickly observed that the network was consistently deciding on base without progressing to quality phase. Looking at the metrics, we noticed that the 99th percentile committee fetch time was quite steep - around 5s on ArchiOz. After discussing, we identified that our configuration needed adjustment, specifically around power table handling. For round 50, we enabled With these changes, we saw:
Given how well round 50 was performing at 20% scale, we went directly to round 51 with 50% scale (773 nodes), keeping all other parameters the same. However, at this scale we hit another roadblock - back to repeated base decisions. 😕 For round 52, we increased the quality timeout multiplier from 2 to 3, but still observed repeated base decisions. Monitoring the metrics more closely, we could see:
This suggested messages were getting dropped during the quality phase! We suspected the chain exchange timestamp age might be a factor, so for round 53 we doubled it to 16s. Unfortunately, we still saw base decisions and started observing "queue full" errors from PubSub. Our hypothesis by the end of the day: we're dropping the initial burst of quality messages because the PubSub buffer size (currently 128) is insufficient for the 50% network scale. 🟡 Known open issues as of
⏭️ Plan for ⏰ Note: Testing will continue on 2025-04-16 @ 08:30 UTC
|
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here is an update from the ⏮️ What happened on 2025-04-16 This network showed immediate improvement! 🎉 We observed:
Encouraged by these results, we moved to round 57 scaling up to 80% of the network (1236 participants). We kept the same config as in round 56 but shuffled participants by changing the seed in explicit power selection. In this network, we observed base decisions dropping and quorum of senders in QUALITY phase decreasing. The pattern suggested our queue was too small relative to the processing velocity of GPBFT. For passive testing round 58, we doubled the validated non-partial message channel size to 512. These changes create some improvements:
The big moment came with passive testing round 59 and round 60, where we pushed to 100% network scale (1548 participants)! For round 60, we:
And the results were excellent, we had almost a perfect Bootstrap phase, catching up to the head of the chain in just 10 instances. And in the steady state we were consistently 5-6 epochs behind the head, and on top of that we only used about 1MiB/s during both the bootstrap and steady state. This is a huge improvement from our previous round of full-scale testing, where we consumed about 10× more bandwidth and had much longer catch-up times. And based on that decided to leave the network running overnight to gather more data on steady state at 100% scale 🟡 Known open issues (as of EOD 2025-04-16)
⏭️ Plan for tomorrow 2025-04-17
The key achievement today: We successfully ran F3 at 100% network scale with excellent performance! 🎯. |
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here is an update from our weekend passive testing at full network scale 🎯: Over the weekend we left testing round 62 running continuously. This testing round was at full scale, with evolving power from EC. This meant it was as close to a "real world" scenario as we can get on mainnet. The data looks promising, and the largest distance from head during the 3-day period was only 9 epochs: ![]() We're starting the week by analyzing the data from this long-running network more deeply, to determine if there are even better parameters we can use for F3, while keeping stability as our highest priority requirement prior to activation. If all goes according to plan, we hope to begin discussions with the implementation teams about the activation date for F3. 🟡 Plan for the day, and known issues to investigate:
Thanks to everyone who has been involved in the F3 passive testing round during the weekend! 🙏 |
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here is an update from our passive testing over the past few days, where we've been doing some fine-tuning parameters for F3 at full network scale! 🚀 ⏮️ What happened since 2025-04-21 After our successful 100% scale testing in round 62, we've continued to refine the parameters to make F3 as stable and performant as possible before activation. We've been methodically testing various configuration tweaks across multiple networks: In
This reduced the bootstrap bandwidth by nearly half to ~600KiB/s (vs ~1MiB/s). However, we observed that in steady state, bandwidth usage returned to ~1MiB/s because instances progress faster and chain changes trigger immediate broadcasts. In
Unfortunately, this network showed significantly worse performance. We observed many multi-round instances and eventually a strange "converge loop" at instance 38. Participation was still high, but we weren't making forward progress. With
The performance improved from the previous round 64, but it was still slower than our best networks. Catch-up took 1h45m (vs ~1h10m in our best configurations), and we think that this test confirmed that both EC delay multiplier and catch-up alignment contribute to network stability. Finally, in round 66, we returned to our most stable configuration and increased buffer sizes:
This network has been running very well! 💪 Catch-up completed in just 1h10m, and has been in a steady state since then. The network has also been handling EC null blocks well, with F3 smoothly recovering back to -5 distance. The key learning: Both EC delay multiplier and catch-up alignment help maintain network synchrony by effectively "slowing down" F3 slightly, giving nodes time to fetch committees and proposals. Our experiments with disabling or reducing these parameters showed they're important for stable operation at scale. 🟡 Known open issues
⏭️ Plan for 2025-04-24
Special thanks to everyone who's been helping monitor and analyze these passive testing networks! 🙏 We are down to single digit days before F3 is activated on Mainnet. |
Beta Was this translation helpful? Give feedback.
-
Hey everyone! 👋 Here is the final update from our passive testing as we approach F3 activation on Mainnet! 🚀 ⏮️ What happened since 2025-04-24 Key developments:
The bootstrap epoch for F3 has been set to 4920480, which corresponds to 2025-04-29T10:00:00Z - just a few days away! 📅 This will be our last passive testing update. The current passive testing round 66 will continue running until activation to provide additional observability data, and the switchover to the actual activation and manifest will be automatic for node operators. 🟡 Known open issues ⏭️ Plan for activation
Special thanks to everyone who has contributed to making F3 a reality! 🙏 After years of development, testing, and refinement, we're now just days away from bringing F3's improvements to the Filecoin Mainnet. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
filecoin-project/go-f3#213
Beta Was this translation helpful? Give feedback.
All reactions