V1.2.2 rv32 submit v3 #1

AndrewD · 2021-10-08T02:05:00Z

Rebased onto v1.2.2. Minor merge conflict resolved.

prior to commit 685e40b, x86_64 was correctly passing O_LARGEFILE to SYS_open; it was removed (defined to 0 in the public header, and changed to use the public definition) as part of that change, probably out of a mistaken belief that it's not needed. however, on a mixed system with 32-bit and 64-bit binaries, it's important that all files be opened with O_LARGEFILE, even if the opening process is 64-bit, in case a descriptor is passed to a 32-bit process. otherwise, attempts to access past 2GB in the 32-bit process could produce EOVERFLOW. most 64-bit archs added later got this right alread, except for mips64. x32 was also affected. there are now fixed.

the fcntl file locking command macro values in the existing generic bits/fcntl.h were the "64" variants, requiring 64-bit archs that use the "plain" variants to have their own bits/fcntl.h, even if they otherwise use the common definitions for everything. since commit 7cc79d1 exposed __LONG_MAX to all bits headers, we can now make the generic one common between 32- and 64-bit archs.

these were only using a custom version because they needed the "non-64" variants of the file locking command macros.

see linux commit 480274787d7e3458bc5a7cfbbbe07033984ad711 tcp: add TCP_INFO status for failed client TFO

also added clone3 on sh and m68k, on sh it's still missing (not yet wired up), but reserved so safe to add. see linux commit fddb5d430ad9fa91b49b1d34d0202ffe2fa0e179 open: introduce openat2(2) syscall linux commit 9a2cef09c801de54feecd912303ace5c27237f12 arch: wire up pidfd_getfd syscall linux commit 8649c322f75c96e7ced2fec201e123b2b073bf09 pid: Implement pidfd_getfd syscall linux commit e8bb2a2a1d51511e6b3f7e08125d52ec73c11139 m68k: Wire up clone3() syscall

add IPPROTO_ETHERNET and IPPROTO_MPTCP, see linux commit 2677625387056136e256c743e3285b4fe3da87bb seg6: fix SRv6 L2 tunnels to use IANA-assigned protocol number linux commit faf391c3826cd29feae02078ca2022d2f912f7cc tcp: Define IPPROTO_MPTCP

TCP_NLA_TIMEOUT_REHASH queries timeout-triggered rehash attempts, tcpm_ifindex limits the scope of TCP_MD5SIG* sockopt to a device. see linux commit 32efcc06d2a15fa87585614d12d6c2308cc2d3f3 tcp: export count for rehash attempts linux commit 6b102db50cdde3ba2f78631ed21222edf3a5fb51 net: Add device index to tcp_md5sig

The use of TCP_ in udp.h is not known, fortunately udp.h is not specified by posix so there are no strict namespace rules, added in linux commit e27cca96cd68fa2c6814c90f9a1cfd36bb68c593 xfrm: add espintcp (RFC 8229)

needed for storage drivers with userspace component that may run in the IO path, see linux commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463 prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim

added in linux commit 75551dbf112c992bc6c99a972990b3f272247e23 random: add GRND_INSECURE to return best-effort non-cryptographic bytes

reuses a bit from CSIGNAL so it can only be used with unshare and clone3, added in linux commit 769071ac9f20b6a447410c7eaa55d1a5233ef40c ns: Introduce Time Namespace

these were missed before, added in linux commit 1201937491822b61641c1878ebcd16a93aed4540 arm64: Expose ARMv8.5 CondM capability to userspace linux commit ca9503fc9e9812aa6258e55d44edb03eb30fc46f arm64: Expose FRINT capabilities to userspace

added in linux commit 1a50ec0b3b2e9a83f1b1245ea37a853aac2f741c arm64: Implement archrandom.h for ARMv8.5-RNG linux commit d4209d8b717311d114b5d47ba7f8249fd44e97c2 arm64: cpufeature: Export matrix and other features to userspace

see linux commit 9e2ba2c34f1922ca1e0c7d31b30ace5842c2e7d1 fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name linux commit 44d705b0370b1d581f46ff23e5d33e8b5ff8ec58 fanotify: report name info for FAN_DIR_MODIFY event

it remaps anon mappings without unmapping the original. chromeos plans to use it with userfaultfd, see: linux commit e346b3813067d4b17383f975f197a9aa28a3b077 mm/mremap: add MREMAP_DONTUNMAP to mremap()

add TCP_NLA_BYTES_NOTSENT and new tcp_zerocopy_receive fields, see linux commit c8856c051454909e5059df4e81c77b9c366c5515 tcp-zerocopy: Return inq along with tcp receive zerocopy. linux commit 33946518d493cdf10aedb4a483f1aa41948a3dab tcp-zerocopy: Return sk_err (if set) along with tcp receive zerocopy. linux commit e08ab0b377a1489760533424437c5f4be7f484a4 tcp: add bytes not sent to SCM_TIMESTAMPING_OPT_STATS

the linux faccessat syscall lacks a flag argument that is necessary to implement the posix api, see linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe vfs: add faccessat2 syscall

On x86 and aarch64 GNU properties may be used to mark ELF objects.

Ethernet protocol number for media redundancy protocol, see linux commit 4714d13791f831d253852c8b5d657270becb8b2a bridge: uapi: mrp: Add mrp attributes.

commit 0a05eac implemented AT_EACCESS for faccessat with a horrible hack, creating a child process to change switch uid/gid and perform the access probe without making potentially irreversible changes to the caller's credentials. this was due to the syscall lacking a flags argument. linux 5.8 introduced a new syscall, SYS_faccessat2, fixing this deficiency. use it if any flags are passed, and fallback to the old strategy on ENOSYS. continue using the old syscall when there are no flags.

taking the deprecated/dropped vfork spec strictly, doing pretty much anything but execve in the child is wrong and undefined. however, these are commonly needed operations to setup the child state before exec, and historical implementations tolerated them. for single-threaded parents, these operations already worked as expected in the vforked child. however, due to the need for __synccall to synchronize id/resource limit changes among all threads, calling these functions in the vforked child of a multithreaded parent caused a misdirected broadcast signaling of all threads in the parent. these signals could kill the parent entirely if the synccall signal handler had never been installed in the parent, or could be ignored if it had, or could signal/kill one or more utterly wrong processes if the parent already terminated (due to vfork semantics, only possible via fatal signal) and the parent tids were recycled. in any case, the expected number of semaphore posts would never happen, so the child would permanently hang (with all signals blocked) waiting for them. to mitigate this, and also make the normal usage case work as intended, treat the condition where the caller's actual tid does not match the tid in its thread structure as single-threaded, and bypass the entire synccall broadcast operation.

previously, if a file descriptor had aio operations pending in the parent before fork, attempting to close it in the child would attempt to cancel a thread belonging to the parent. this could deadlock, fail, or crash the whole process of the cancellation signal handler was not yet installed in the parent. in addition, further use of aio from the child could malfunction or deadlock. POSIX specifies that async io operations are not inherited by the child on fork, so clear the entire aio fd map in the child, and take the aio map lock (with signals blocked) across the fork so that the lock is kept in a consistent state.

the dummy definition of __abort_lock in sigaction.c was performing exactly the same role that putting the lock in its own source file could and should have been used to achieve. while we're moving it, give it a proper declaration.

if the multithreaded parent forked while another thread was calling sigaction for SIGABRT or calling abort, the child could inherit a lock state in which future calls to abort will deadlock, or in which the disposition for SIGABRT has already been reset to SIG_DFL. this is nonconforming since abort is AS-safe and permitted to be called concurrently with fork or in the MT-forked child.

this makes the code slightly smaller and eliminates these functions from relevance to possible future changes to multithreaded fork. the barrier of a_store isn't technically needed here, but a_store is used anyway for internal consistency of the memory model.

queue_ctors should not be called with the init_fini_lock held, since it may longjmp out on allocation failure. this introduces a minor TOCTOU race with p->constructed, but one already exists further down anyway, and by design it's okay to run through the queue more than once anyway. the only reason we bother to check p->constructed at all is to avoid spurious failure of dlopen when the library is already fully loaded and constructed.

commit 188759b documented the intent to allow recursive dlopen based on tracking ctor_visitor, but used a kernel tid rather than the pthread_t to identify the caller. as a result, it would not behave as intended under fork by a ctor, where the child tid would not match.

this is in preparation for implementing _Fork from POSIX-future, factored as a separate commit to improve readability of history.

the _Fork interface is defined for future issue of POSIX as the outcome of Austin Group issue 62, which drops the AS-safety requirement for fork, and provides an AS-safe replacement that does not run the registered atfork handlers.

float_t should represent the type that is used to evaluate float expressions internally. On s390x, float_t is currently set to double. In contrast, the isa supports single-precision float operations and compilers by default evaluate float in single precision, which violates the C standard (sections 5.2.4.2.2 and 7.12 in C11/C17, to be precise). With -fexcess-precision=standard, gcc evaluates float in double precision, which aligns with the standard yet at the cost of added conversion instructions. gcc-11 will drop the special case to retrofit double precision behavior for -fexcess-precision=standard so that __FLT_EVAL_METHOD__ will be 0 on s390x in any scenario. To improve standards compliance and compatibility with future compiler direction, this patch changes the definition of float_t to be derived from the compiler's __FLT_EVAL_METHOD__.

both __clone and __syscall_cp_asm failed to restore the original value of r6 after using it as a syscall argument register. the extent of breakage is not known, and in some cases may be mitigated by the only callers being internal to libc; if they used r6 but no longer needed its value after the call, they may not have noticed the problem. however at least posix_spawn (which uses __clone) was observed returning to the application with the wrong value in r6, leading to crash. since the call frame ABI already provides a place to spill registers, fixing this is just a matter of using it. in __clone, we also spuriously restore r6 in the child, since the parent branch directly returns to the caller. this takes the value from an uninitialized slot of the child's stack, but is harmless since there is no caller to return to in the child.

ucontext.h depends on the internal struct tag name for namespacing reasons, and the intent was always for it to be consistent across archs anyway.

this change should have been made when priority inheritance mutex support was added. if priority protection is also added at some point the implementation will need to change and will probably no longer be a simple bit shuffling.

pthread_once is not compatible with MT-fork constraints (commit 167390f) and is not needed here anyway; we already have a lock suitable for initialization. while changing this, fix a corner case where AT_MINSIGSTKSZ gives a value that's more than MINSIGSTKSZ but by a margin of less than 2048, thereby causing the size to be reduced. it shouldn't matter but the intent was to be the larger of a 2048-byte margin over the legacy fixed minimum stack requirement or a 512-byte margin over the minimum the kernel reports at runtime.

the intent here is just to scan at least l bytes forward for the end of the haystack and at least some decent minimum to avoid doing it over and over if the needle is short, with no need to be precise. the comment erroneously stated this as an estimate for MIN when it's actually an estimate for MAX.

this allows the lock to be shared with setlocale, eliminates repeated per-category lock/unlock in newlocale, and will allow the use of pthread_once in newlocale to be dropped (to be done separately).

in general, pthread_once is not compatible with MT-fork constraints (commit 167390f). here it actually no longer matters, because it's now called with a lock held, but since the lock is held it's pointless to use pthread_once.

this is necessary for MT-fork correctness now that the code runs under locale lock. it would not be hard to avoid, but __get_locale is already using libc-internal malloc anyway. this can be reconsidered during locale overhaul later if needed.

while the layouts match, the member member naming expected by software using mcontext_t omits the sc_ prefix.

commit 2412638 got the size of struct v4l2_buffer wrong and omitted the tv_usec member slot from the offset list, so the ioctl numbers never matched and fallback code path was never taken. this caused the affected ioctls to fail with ENOTTY on kernels not new enough to have the native time64 ioctls.

commit 2412638 got the size of struct v4l2_event wrong and failed to account for the fact that the old struct might be either 120 bytes with time misaligned mod 8, or 128 bytes with time aligned mod 8, due to the contained union having 64-bit members whose alignment is arch-dependent. rather than adding new logic to handle the differences, use an actual stripped-down version of the structure in question to derive the ioctl number, size, and offsets.

riscv32 and future architectures lack the _time32 variants entirely, so don't try to use their numbers.

fix merge conflict in v2 submit

riscv32 and future architectures only provide prlimit64.

riscv64 and future architectures only provide the clock_ functions.

We need to make internal syscalls to SYS_statx when SYS_fstatat is not available without changing the musl API.

riscv32 and future architectures lack it.

riscv32 and future architectures lack wait4. waitpid is required by POSIX to be a cancellation point. pclose is specified as undefined if a cancellation occurs, so it would be permitted for it to call a cancellable wait function; however, as a quality of implementation matter, pclose must close the pipe fd before it can wait (consider popen("yes","r")) and if the wait could be interrupted the pipe FILE would be left in an intermediate state that portable software cannot recover from, so the only useful behavior is for pclose to NOT be a cancellation point. We therefore support both at a small cost in code size. wait4 is historically not a cancellation point in musl; we retain that since we need the non-cancellable version of __wait4 anyway.

Matches glibc behavior and fixes a case where we could fall off the function without returning a value.

not empty because buildroot removes removes empty files generated by a patch...

These are mostly copied from riscv64. _Addr and _Reg had to become int to avoid errors in libstdc++ when size_t and std::size_t mismatch. There is no kernel stat struct; the userspace stat matches glibc in the sizes and offsets of all fields (including glibc's __dev_t __pad1). The jump buffer is 12 words larger to account for 12 saved double-precision floats; additionally it should be 64-bit aligned to save doubles. The syscall list was significantly revised by deleting all time32 and pre-statx syscalls, and renaming several syscalls that have different names depending on __BITS_PER_LONG, notably mmap2 and _llseek. futex was added as an alias to futex_time64 since it is widely used by software which does not pass time arguments.

These are identical to riscv64.

Identical to riscv64.

Largely copied from riscv64 but required recalculation of offsets.

Identical to riscv64 except for stack offsets in clone.

richfelker and others added 30 commits September 3, 2020 17:30

fix missing newline in herror output

262003a

use generic bits/fcntl.h for x86_64 and riscv64

ffac0c2

these were only using a custom version because they needed the "non-64" variants of the file locking command macros.

netinet/tcp.h: update tcp_info for linux v5.5

d4f2981

see linux commit 480274787d7e3458bc5a7cfbbbe07033984ad711 tcp: add TCP_INFO status for failed client TFO

netinet/udp.h: add TCP_ENCAP_ESPINTCP from linux v5.6

1ab341e

The use of TCP_ in udp.h is not known, fortunately udp.h is not specified by posix so there are no strict namespace rules, added in linux commit e27cca96cd68fa2c6814c90f9a1cfd36bb68c593 xfrm: add espintcp (RFC 8229)

sys/prctl.h: add PR_{SET,GET}_IO_FLUSHER from linux v5.6

8f4aa78

needed for storage drivers with userspace component that may run in the IO path, see linux commit 8d19f1c8e1937baf74e1962aae9f90fa3aeab463 prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaim

sys/random.h: add GRND_INSECURE from linux v5.6

3da18e6

added in linux commit 75551dbf112c992bc6c99a972990b3f272247e23 random: add GRND_INSECURE to return best-effort non-cryptographic bytes

sched.h: add CLONE_NEWTIME from linux v5.6

43b640c

reuses a bit from CSIGNAL so it can only be used with unshare and clone3, added in linux commit 769071ac9f20b6a447410c7eaa55d1a5233ef40c ns: Introduce Time Namespace

aarch64: add HWCAP2_ macros from linux v5.3

0296baf

these were missed before, added in linux commit 1201937491822b61641c1878ebcd16a93aed4540 arm64: Expose ARMv8.5 CondM capability to userspace linux commit ca9503fc9e9812aa6258e55d44edb03eb30fc46f arm64: Expose FRINT capabilities to userspace

aarch64: add new HWCAP2_ macros from linux v5.6

94ab68c

added in linux commit 1a50ec0b3b2e9a83f1b1245ea37a853aac2f741c arm64: Implement archrandom.h for ARMv8.5-RNG linux commit d4209d8b717311d114b5d47ba7f8249fd44e97c2 arm64: cpufeature: Export matrix and other features to userspace

sys/fanotify.h: update to linux v5.7

8adf42f

see linux commit 9e2ba2c34f1922ca1e0c7d31b30ace5842c2e7d1 fanotify: send FAN_DIR_MODIFY event flavor with dir inode and name linux commit 44d705b0370b1d581f46ff23e5d33e8b5ff8ec58 fanotify: report name info for FAN_DIR_MODIFY event

sys/mman.h: add MREMAP_DONTUNMAP from linux v5.7

a6c302b

it remaps anon mappings without unmapping the original. chromeos plans to use it with userfaultfd, see: linux commit e346b3813067d4b17383f975f197a9aa28a3b077 mm/mremap: add MREMAP_DONTUNMAP to mremap()

bits/syscall.h: add __NR_faccessat2 from linux v5.8

9b7ed97

the linux faccessat syscall lacks a flag argument that is necessary to implement the posix api, see linux commit c8ffd8bcdd28296a198f237cc595148a8d4adfbe vfs: add faccessat2 syscall

elf.h: add .note.gnu.property related definitions

6b1741a

On x86 and aarch64 GNU properties may be used to mark ELF objects.

netinet/if_ether.h: add ETH_P_MRP from linux v5.8

f035c7b

Ethernet protocol number for media redundancy protocol, see linux commit 4714d13791f831d253852c8b5d657270becb8b2a bridge: uapi: mrp: Add mrp attributes.

rename fork source file

e1e98d8

this is in preparation for implementing _Fork from POSIX-future, factored as a separate commit to improve readability of history.

implement _Fork and refactor fork using it

bd15342

the _Fork interface is defined for future issue of POSIX as the outcome of Austin Group issue 62, which drops the AS-safety requirement for fork, and provides an AS-safe replacement that does not run the registered atfork handlers.

mhillenbrand and others added 30 commits December 3, 2020 19:07

riscv64: fix inconsistent ucontext_t struct tag

56f0631

ucontext.h depends on the internal struct tag name for namespacing reasons, and the intent was always for it to be consistent across archs anyway.

fix omission of non-stub pthread_mutexattr_getprotocol

90ff016

this change should have been made when priority inheritance mutex support was added. if priority protection is also added at some point the implementation will need to change and will probably no longer be a simple bit shuffling.

lift locale lock out of internal __get_locale

37fcc13

this allows the lock to be shared with setlocale, eliminates repeated per-category lock/unlock in newlocale, and will allow the use of pthread_once in newlocale to be dropped (to be done separately).

drop use of pthread_once in newlocale

36246b3

in general, pthread_once is not compatible with MT-fork constraints (commit 167390f). here it actually no longer matters, because it's now called with a lock held, but since the lock is held it's pointless to use pthread_once.

sh: fix incorrect mcontext_t member naming

db981ff

while the layouts match, the member member naming expected by software using mcontext_t omits the sc_ prefix.

release 1.2.2

85e0e35

Remove ARMSUBARCH relic from configure

6f1536e

time64: Don't make aliases to nonexistent syscalls

b0023cf

riscv32 and future architectures lack the _time32 variants entirely, so don't try to use their numbers.

time64: Only setrlimit if it is implemented

2e0d719

fix merge conflict in v2 submit

time64: Only getrlimit/setrlimit if they exist

f19f235

riscv32 and future architectures only provide prlimit64.

time64: Only gettimeofday/settimeofday if exist

416050d

riscv64 and future architectures only provide the clock_ functions.

Add src/internal/statx.h

5c62d58

We need to make internal syscalls to SYS_statx when SYS_fstatat is not available without changing the musl API.

Only call fstatat if defined

be5ba55

riscv32 and future architectures lack it.

riscv: Fall back to syscall __riscv_flush_icache

8765b64

Matches glibc behavior and fixes a case where we could fall off the function without returning a value.

riscv32: Target and subtarget detection

16b0f6d

riscv32: add arch kstat.h header

30de566

not empty because buildroot removes removes empty files generated by a patch...

riscv32: Add fenv and math

5e76d97

These are identical to riscv64.

riscv32: Add dlsym

f97a81b

Identical to riscv64.

riscv32: Add jmp_buf and sigreturn

1a79cc2

Largely copied from riscv64 but required recalculation of offsets.

riscv32: Add thread support

ceda0cc

Identical to riscv64 except for stack offsets in clone.

Only call faccessat2 if defined

a98701c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

V1.2.2 rv32 submit v3 #1

V1.2.2 rv32 submit v3 #1

Uh oh!

AndrewD commented Oct 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants

V1.2.2 rv32 submit v3 #1

Are you sure you want to change the base?

V1.2.2 rv32 submit v3 #1

Uh oh!

Conversation

AndrewD commented Oct 8, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

12 participants