Skip to content

Address nasa#1401 for Linux OS_QueueGet finite timeouts.#1514

Open
francdoc wants to merge 3 commits into
nasa:mainfrom
francdoc:fix-1401-message-receive-timeout-if-sysclock-changes
Open

Address nasa#1401 for Linux OS_QueueGet finite timeouts.#1514
francdoc wants to merge 3 commits into
nasa:mainfrom
francdoc:fix-1401-message-receive-timeout-if-sysclock-changes

Conversation

@francdoc

@francdoc francdoc commented Sep 6, 2025

Copy link
Copy Markdown

Update

This description reflects the current branch after addressing earlier review feedback. The earlier implementation shape discussed in older comments has been superseded.

Checklist

Describe the contribution

Addresses #1401 for the Linux path of the POSIX queue implementation.

Non-Linux POSIX behavior remains unchanged, so this is a scoped Linux repair rather than a complete POSIX-wide redesign.

Finite OS_QueueGet() timeouts now track an internal CLOCK_MONOTONIC deadline on Linux, avoiding timeout extension when CLOCK_REALTIME jumps forward or backward.

This change is intentionally scoped to Linux inside the POSIX backend. It relies on Linux-documented behavior that POSIX message queue descriptors are pollable file descriptors. This is not presented as portable POSIX behavior.

The current implementation:

  • Adds Linux-only static helpers in src/os/posix/src/os-impl-queues.c.
  • Uses CLOCK_MONOTONIC to compute and track the finite timeout deadline.
  • Uses poll() on the Linux mqd_t descriptor to wait for queue readiness up to the remaining monotonic time.
  • Uses mq_timedreceive() with an already-expired timeout for the receive step so it cannot block if another reader consumes the message first.
  • Leaves non-Linux POSIX behavior on the existing mq_timedreceive() path.
  • Leaves public API unchanged.
  • Preserves OS_PEND and OS_CHECK semantics.

Testing performed

Local generic-linux permissive unit-test build:

cmake -DENABLE_UNIT_TESTS=true -DOSAL_SYSTEM_BSPTYPE=generic-linux -DOSAL_CONFIG_DEBUG_PERMISSIVE_MODE=TRUE ..
make
ctest --output-on-failure

Result:

100% tests passed, 0 tests failed out of 85

Privileged Linux queue regression test:

sudo ./queue-test

Result:

TOTAL::62 PASS::62 FAIL::0
QueueTimeoutTimeJumpTest TOTAL::21 PASS::21 FAIL::0
Timejump task applied
clock_settime CLOCK_REALTIME restore Rc=0

Expected behavior changes

  • API Change: No public API changes.
  • Behavior Change: On Linux, finite OS_QueueGet() waits in the POSIX backend are bounded by monotonic elapsed time instead of wall-clock time.
  • OS_PEND behavior is unchanged.
  • OS_CHECK behavior is unchanged.
  • Non-Linux POSIX behavior is unchanged.

System(s) tested on

  • Hardware: Laptop, x86_64
  • OS: Ubuntu Linux
  • BSP: generic-linux

Additional context

The regression test is Linux-only and privilege-gated because it uses clock_settime(CLOCK_REALTIME, ...).

The test applies a bounded CLOCK_REALTIME jump, verifies finite queue timeout behavior, and restores CLOCK_REALTIME during teardown using the saved realtime value plus elapsed monotonic time.

Third party code

None.

Contributor Info - All information REQUIRED for consideration of pull request

Franco Chiesa Docampo - Personal.

@joelsherrill

Copy link
Copy Markdown

mq_timedreceive_monotonic() is not in the POSIX standard. Per what I found on the Net, it is specific to QNX and not supported on Linux, FreeBSD, or RTEMS. Per the POSIX Issue 8 definition of mqueue.h, message queues do not include a function similar to pthread_mutex_clocklock() which lets you specify the clock.

poll() is similar to select() in that there it should not be assumed that it will work on anything other than sockets when dealing with an RTOS TCP/IP stack.

@francdoc

francdoc commented Sep 8, 2025

Copy link
Copy Markdown
Author

Hi @joelsherrill I apologize. Yes, mq_timedreceive_monotonic() is a method I implemented to try to propose a solution. It does not exist in the POSIX standard. I can edit my original post to be more clear and specific to avoid any misunderstanding.

I will review your feedback and make proper changes to the proposed solution to check if it remains viable. Thank you for your response.

I'm changing this PR to draft.

@francdoc francdoc marked this pull request as draft September 8, 2025 16:36
@joelsherrill

Copy link
Copy Markdown

Hi @joelsherrill I apologize. Yes, mq_timedreceive_monotonic() is a method I implemented to try to propose a solution. It does not exist in the POSIX standard. I can edit my original post to be more clear and specific to avoid any misunderstanding.

I will review your feedback and make proper changes to the proposed solution to check if it remains viable. Thank you for your response.

I'm changing this PR to draft.

Since the name looks like POSIX function, you need to at least add the suffix of "_np" for non-portable. But since it is provided outside of the OS and libraries, I'd recommend using another name entirely.

What does the implementation of that function depend on?

@francdoc

francdoc commented Sep 17, 2025

Copy link
Copy Markdown
Author

Hi @joelsherrill I apologize. Yes, mq_timedreceive_monotonic() is a method I implemented to try to propose a solution. It does not exist in the POSIX standard. I can edit my original post to be more clear and specific to avoid any misunderstanding.
I will review your feedback and make proper changes to the proposed solution to check if it remains viable. Thank you for your response.
I'm changing this PR to draft.

Since the name looks like POSIX function, you need to at least add the suffix of "_np" for non-portable. But since it is provided outside of the OS and libraries, I'd recommend using another name entirely.

What does the implementation of that function depend on?

Hi Joel. Function is now purely internal and no longer exposed in os-posix.h.

It depends on:

    • clock_gettime(CLOCK_MONOTONIC, …)
    • poll() on the mqd_t file descriptor
    • mq_receive()

The helper is now a static in the .c file. I renamed to OS_Posix_MqReceiveUntilMonotonicDeadline, with no public prototype.

I’ll continue testing this PR for further improvements.

@francdoc francdoc marked this pull request as ready for review September 17, 2025 16:58
@francdoc francdoc marked this pull request as draft September 17, 2025 16:59
@francdoc francdoc marked this pull request as ready for review January 15, 2026 16:10
Keep the monotonic timeout repair scoped to the Linux path in the POSIX
queue backend. Preserve OS_PEND, OS_CHECK and the non-Linux
mq_timedreceive() path. While tightening helper naming, poll()/receive
behavior and return-value checking.

Also make the queue time-jump regression test Linux-only, privilege-
gated, bounded and restore CLOCK_REALTIME during teardown.
@francdoc francdoc force-pushed the fix-1401-message-receive-timeout-if-sysclock-changes branch from bb2865d to d6c08fe Compare March 22, 2026 03:33
…ceive-timeout-if-sysclock-changes

# Conflicts:
#	.gitignore
@francdoc francdoc force-pushed the fix-1401-message-receive-timeout-if-sysclock-changes branch from a01346f to 23bc133 Compare March 22, 2026 20:00
@francdoc francdoc changed the title Fix nasa#1401, make OS_QueueGet timeouts monotonic. Fix nasa#1401, make Linux OS_QueueGet finite timeouts monotonic. Jun 9, 2026
@francdoc

francdoc commented Jun 9, 2026

Copy link
Copy Markdown
Author

Updated this PR after earlier feedback.

The current diff is now limited to:

  • src/os/posix/src/os-impl-queues.c
  • src/tests/queue-test/queue-test.c

The implementation is intentionally Linux-only inside the POSIX backend. It relies on Linux-documented polling capacity of mqueue descriptors and does not present that behavior as portable POSIX.

The old standard-looking helper/public-prototype direction has been removed. Non-Linux POSIX behavior remains on the existing mq_timedreceive() path. Public API, OS_PEND, and OS_CHECK semantics are unchanged.

Local verification:

  • generic-linux permissive unit-test build
  • ctest: 85/85 tests passed
  • sudo ./queue-test: TOTAL::62 PASS::62 FAIL::0
  • QueueTimeoutTimeJumpTest: PASS::21 FAIL::0
  • CLOCK_REALTIME restored successfully during teardown

The regression test is Linux-only, privilege-gated, bounded, and restores CLOCK_REALTIME during teardown.

@francdoc francdoc changed the title Fix nasa#1401, make Linux OS_QueueGet finite timeouts monotonic. Address nasa#1401 for Linux OS_QueueGet finite timeouts. Jun 9, 2026
@francdoc

francdoc commented Jun 10, 2026

Copy link
Copy Markdown
Author

One design-scope question:

Is a Linux-scoped repair acceptable for #1401, or should this PR aim for a POSIX-general solution?

The current patch relies on Linux-specific poll() support for mqd_t, and keeps non-Linux POSIX behavior unchanged.

I also prototyped a local no-poll, mqueue-preserving alternative that avoids treating mqd_t as a file descriptor:

until CLOCK_MONOTONIC deadline:
    call mq_timedreceive() with an already-expired timeout
    if a message is received, return success
    if the queue is empty, sleep briefly against CLOCK_MONOTONIC
return OS_QUEUE_TIMEOUT

That approach avoids poll(mqd_t), but trades exact readiness wakeup for bounded latency and periodic wakeups.

I would prefer maintainer guidance before changing the implementation to that direction if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Message Receive timeout if system clock changes?

2 participants