Skip to content

Conversation

@MisterDA
Copy link
Contributor

@MisterDA MisterDA commented Nov 19, 2025

I did a review of ocamlmerlin.c. I found minor and relatively major issues:

  1. Erroneous size computation of the struct sockaddr_un. On some BSD systems the struct sockaddr family of types have a leading sxx_len field. It was not taken into account. Use offsetof, as described by POSIX, to compute the socklen.
    https://man.freebsd.org/cgi/man.cgi?unix(4)
    https://man7.org/linux/man-pages/man7/unix.7.html
    https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_un.h.html

  2. Functions from WinAPI don't set errno, and their failure code can be retrieved with GetLastError and the error string with FormatMessage. Functions from the CRT do set errno, and perror can be used to display the error string.

  3. Structures for passing file descriptor using CMSG and Unix domain sockets were not properly aligned, potentially leading to inefficient or buggy behavior.

There's also a few patches that improve the code quality, in my opinion. The code now assumes a C99 compiler, and makes use of C23 and C2y features or extensions, if available. For reference, the OCaml runtime requires at least C11.

@xvw
Copy link
Collaborator

xvw commented Nov 19, 2025

Oh wow! I think it deserve a Changelog entry !

@voodoos
Copy link
Collaborator

voodoos commented Nov 19, 2025

Thanks a lot @MisterDA ! Do you know if this could be linked to #1993 ? Or if it's another set of (your?) patches we didn't backport to 413 ?

@MisterDA
Copy link
Contributor Author

MisterDA commented Nov 19, 2025

Thanks a lot @MisterDA ! Do you know if this could be linked to #1993 ? Or if it's another set of (your?) patches we didn't backport to 413 ?

Ah yes, I think this patch will fix the warning encountered by the user; but the build failure is unrelated. This patch series doesn't apply cleanly to v4.7-413, unfortunately. The opam files for these old merlin version are missing constraints on menhir.

# syntax=docker/dockerfile:1
FROM ocaml/opam:ubuntu-25.04-ocaml-4.13
RUN sudo ln -sf /usr/bin/opam-2.4 /usr/bin/opam && opam init --reinit -ni
ADD --keep-git-dir --link --chown=1000:1000 https://github.com/ocaml/merlin.git#v4.7-413 /home/opam/merlin
RUN git config --global --add safe.directory ~/merlin
WORKDIR /home/opam/merlin
RUN --mount=type=cache,target=/home/opam/.opam/download-cache,sharing=locked,uid=1000,gid=1000 \
    opam install -y --deps-only --with-dev --formula='"menhir" {<= "20220210"}' merlin.4.7-413
RUN patch -p1 <<'EOF'
diff --git a/src/frontend/ocamlmerlin/dune b/src/frontend/ocamlmerlin/dune
index faa01b93..cc549032 100644
--- a/src/frontend/ocamlmerlin/dune
+++ b/src/frontend/ocamlmerlin/dune
@@ -14,7 +14,7 @@
  (modules (:standard \ gen_ccflags))
  (libraries config yojson merlin_analysis merlin_kernel
             merlin_utils os_ipc ocaml_parsing query_protocol query_commands
-            ocaml_typing ocaml_utils seq))
+            ocaml_typing ocaml_utils))

 (executable
  (name      gen_ccflags)
EOF
ADD --chown=1000 --link https://github.com/ocaml/merlin/pull/1998.diff 1998.diff
RUN patch -p1 < 1998.diff
RUN opam exec -- make -j1

@MisterDA
Copy link
Contributor Author

Oh wow! I think it deserve a Changelog entry !

Could you do that for me? 🙏

@MisterDA MisterDA marked this pull request as draft November 25, 2025 11:05
@MisterDA
Copy link
Contributor Author

Marking as draft because the CI hasn't run, and there are errors with my patches on Windows. I'll circle back to this.

@MisterDA MisterDA force-pushed the ocamlmerlin.c branch 5 times, most recently from 18a16e8 to 94dba5b Compare November 25, 2025 23:55
@MisterDA MisterDA changed the title Minor patches for ocamlmerlin.c Minor patches for the C code in Merlin Nov 26, 2025
@MisterDA MisterDA marked this pull request as ready for review November 26, 2025 07:59
@MisterDA
Copy link
Contributor Author

Ready again!

@voodoos voodoos self-requested a review November 27, 2025 14:42
@MisterDA MisterDA marked this pull request as draft December 1, 2025 14:37
@voodoos
Copy link
Collaborator

voodoos commented Dec 1, 2025

Hum, something went wrong with your rebase :-)
Also, if it's ready for review you can undraft it!

@MisterDA
Copy link
Contributor Author

MisterDA commented Dec 1, 2025

Hum, something went wrong with your rebase :-)

Fixed.

Also, if it's ready for review you can undraft it!

There were two issues on Windows, but I confirm that the test suite passes with both MSVC and mingw-w64.

@MisterDA MisterDA marked this pull request as ready for review December 1, 2025 16:24
@MisterDA
Copy link
Contributor Author

MisterDA commented Dec 8, 2025

I've added more explanations to some of the commit messages.

On Windows, use [GetTempPath2][] if available or [GetTempPath][] to
retrieve the temporary directory path.

On other systems, follow the [XDG Base Directory Specification][XDG]
and put the socket file in the XDG runtime directory. It is indicated
by the `XDG_RUNTIME_DIR` environment variable, that is always set on
platforms implementing the spec. Otherwise, fallback to `TMPDIR` or
`/tmp`.

[GetTempPath2]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-gettemppath2a
[GetTempPath]: https://learn.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-gettemppatha
[XDG]: https://specifications.freedesktop.org/basedir/latest/
In C, arrays passed as arguments to functions decay into pointers, and
the size information is lost. This means all of these are equivalent:

    void f(char *s);
    void f(char s[]);
    void f(char s[3615]);

One way to partially retain this information is to use `static n` with
`n` being the array size: the compiler now knows that the first `n`
elements after the pointed address can be accessed. This C99 feature
is unfortunately not supported by MSVC (but supported by clang-cl).
The compiler uses this info to check that the caller supplies a long
enough buffer to the callee. The size information is not used inside
the callee.

See [Detect if the static keyword is supported in array type
derivations][1] for a rationale behind the `ATLEAST` naming.

[1]: https://lists.gnu.org/archive/html/autoconf/2025-10/msg00004.html
- make the function static;
- don't use K&R-style declaration, compilers now reject it before
  C23. After C23 `f(void)` is similar to `f()`.
Quoting POSIX 2024:

> The size of sun_path is required to be constant, but intentionally
> does not have a specified name for that constant. Historically,
> different implementations used different sizes. For example, 4.3 BSD
> used a size of 108, and 4.4 BSD used a size of 104. Since most
> implementations originate from BSD versions, the size is typically
> in the range 92 to 108. An application can deduce the size by using
> `sizeof(((struct sockaddr_un *)0)->sun_path)`.
`perror` can be used with functions from the CRT that set `errno`,
otherwise `GetLastError` and `FormatMessage` have to be used instead.
_snprintf doesn't null-terminate strings.
MSVCRT doens't provide snprintf, but if __USE_MINGW_ANSI_STDIO is
defined, then mingw-w64 will provide this function.
There are environments where mingw-w64 is configured with MSVCRT, and
others where it is configured with UCRT. It seems safer to use the
compatibility shims.
UCRT provides snprintf, and MSVC defaults to UCRT.
Avoid possible truncation when preparing the invocation by using a
bigger buffer.

> In the Windows API the maximum length for a path is MAX_PATH, which
> is defined as 260 characters.

https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

> `lpCommandLine`
>
> The command line to be executed.
> The maximum length of this string is 32,767 characters, including
> the Unicode terminating null character. If `lpApplicationName` is
> `NULL`, the module name portion of `lpCommandLine` is limited to
> `MAX_PATH` characters.

https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessw
https://devblogs.microsoft.com/oldnewthing/20031210-00/?p=41553
It works with mingw-w64 too.

    warning: '__p__environ' redeclared without dllimport attribute: previous dllimport ignored [-Wattributes]

https://gcc.gnu.org/onlinedocs/gcc/Microsoft-Windows-Variable-Attributes.html#index-dllimport-variable-attribute
To ensure the ancillary data buffer is suitably aligned, the
historical technique is to use a union with a backing char buffer and
a member having the desired alignment, as the alignment of the union
is the strictest alignment of its members. Another possibility is to
use the C11 alignas specifier. Keep the old technique for
compatibility.

Proper alignment should be ensured when creating the structure and
when reading it back from the kernel.

> `CMSG_DATA()`
>     returns a pointer to the data portion of a cmsghdr. The pointer
>     returned cannot be assumed to be suitably aligned for accessing
>     arbitrary payload data types. Applications should not cast it to
>     a pointer type matching the payload, but should instead use
>     `memcpy(3)` to copy data to or from a suitably declared object.

See also:
- https://man7.org/linux/man-pages/man7/unix.7.html
- https://man7.org/linux/man-pages/man3/cmsg.3.html
- https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/sys_socket.h.html
- https://docs.kernel.org/core-api/unaligned-memory-access.html#why-unaligned-access-is-bad
Copy link
Collaborator

@voodoos voodoos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We reviewed this with Antonin and I am confident that this is a very nice improvement to our C frontend. The main features are the correction of all warnings, and improved Windows compatibility.

Thanks a lot @MisterDA !

@voodoos voodoos merged commit 3a730e4 into ocaml:main Dec 11, 2025
8 of 10 checks passed
@MisterDA MisterDA deleted the ocamlmerlin.c branch December 11, 2025 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants