pie/restorer: fix nanosecond overflow when restoring POSIX timers#2982
pie/restorer: fix nanosecond overflow when restoring POSIX timers#2982Utkal059 wants to merge 1 commit intocheckpoint-restore:criu-devfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a restore-time correctness issue in CRIU’s PIE restorer where adding the current time to an absolute timerfd expiration could overflow tv_nsec, producing an invalid itimerspec after restore.
Changes:
- Add nanosecond carry propagation when adjusting
it_valueforTFD_TIMER_ABSTIMEtimerfds during restore.
| t->val.it_value.tv_nsec += ts.tv_nsec; | ||
| if (t->val.it_value.tv_nsec >= NSEC_PER_SEC) { | ||
| t->val.it_value.tv_nsec -= NSEC_PER_SEC; | ||
| t->val.it_value.tv_sec++; | ||
| } |
There was a problem hiding this comment.
The block above still says nanoseconds aren’t adjusted and mentions an overflow FIXME, but this change now adjusts and normalizes tv_nsec. Please update/remove that comment/FIXME so it matches the new behavior and doesn’t mislead future readers.
| t->val.it_value.tv_nsec += ts.tv_nsec; | ||
| if (t->val.it_value.tv_nsec >= NSEC_PER_SEC) { | ||
| t->val.it_value.tv_nsec -= NSEC_PER_SEC; | ||
| t->val.it_value.tv_sec++; | ||
| } |
There was a problem hiding this comment.
This fixes a restore-time edge case for TFD_TIMER_ABSTIME, but the existing ZDTM timerfd test only covers relative timers. Consider adding/adjusting a test case that arms a timerfd with TFD_TIMER_ABSTIME and a near-boundary tv_nsec value so this normalization path is exercised and prevents regressions.
7abd384 to
8587321
Compare
|
CI failures appear to be due to Cirrus CI monthly compute limit being exceeded, not related to this patch. The local CircleCI checks (clang + gcc) passed successfully |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## criu-dev #2982 +/- ##
============================================
- Coverage 57.76% 57.31% -0.46%
============================================
Files 142 157 +15
Lines 37664 40283 +2619
Branches 0 8831 +8831
============================================
+ Hits 21758 23087 +1329
- Misses 15906 16989 +1083
- Partials 0 207 +207 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
When accumulating ts.tv_nsec into t->val.it_value.tv_nsec, the result can exceed NSEC_PER_SEC (999999999). Without normalization, the restored timer carries an invalid tv_nsec value, causing incorrect timer behavior after restore. Add a single carry-propagation step: if tv_nsec >= NSEC_PER_SEC, subtract one full second and increment tv_sec. A single if-check suffices because ts.tv_nsec is always < NSEC_PER_SEC on entry, so overflow is at most 1s. Signed-off-by: Utkal Singh <[email protected]>
8587321 to
ee30e3b
Compare
|
The Cirrus CI failures are due to the monthly compute limit being exhausted (failing after 1s), not related to this patch. CircleCI local-clang and local-gcc are currently running. The previous GitHub Actions run passed all jobs except compat-test (GCC) which has been fixed in this force-push by replacing NSEC_PER_SEC with 1000000000L since the PIE build environment has no access to standard headers. Could a maintainer please approve the workflows? |
| * overflow the limit NSEC_PER_SEC FIXME | ||
| * and restore procedure takes some time itself. | ||
| */ | ||
| if (sys_clock_gettime(t->clockid, &ts)) { |
There was a problem hiding this comment.
could you explaine what is actually going on here? If TFD_TIMER_ABSTIME is set, we probably have to know the exact time when this timer should be fired, don't we? Why do we need to adjust it here?
When accumulating ts.tv_nsec into t->val.it_value.tv_nsec, the result
can exceed NSEC_PER_SEC (999999999). Without normalization, the restored
timer carries an invalid tv_nsec value, causing incorrect timer behavior
after restore.
Add a single carry-propagation step: if tv_nsec >= NSEC_PER_SEC, subtract
one full second and increment tv_sec. A single if-check suffices because
ts.tv_nsec is always < NSEC_PER_SEC on entry, so overflow is at most 1s.