Skip to content

Commit a3bf07d

Browse files
authored
Re-vendor and re-patch grisu (#597)
* Let's have a maintenance/ folder And move the existing maintenance notes * Pull grisu3.{h,c} from upstream * Patch the header file * Patch the c file * Use #pragma once * Document this process * Remove the working copies
1 parent d56888e commit a3bf07d

File tree

8 files changed

+479
-5
lines changed

8 files changed

+479
-5
lines changed

.Rbuildignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,4 @@
4949
^[.]?air[.]toml$
5050
^scratch\.R$
5151
^compile_commands\.json$
52+
^maintenance$
File renamed without changes.

maintenance/grisu3-README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Vendoring grisu3 from MathGeoLib
2+
3+
The `grisu3.c` and `grisu3.h` files in `src/` are vendored from the [MathGeoLib](https://github.com/juj/MathGeoLib) library. We maintain local modifications documented as patch files.
4+
5+
## Files in maintenance/
6+
7+
- **`grisu3-pull-from-upstream.R`**: Script to download fresh grisu3 files from upstream
8+
- **`grisu3-upstream-info.txt`**: Records the upstream commit SHA and date
9+
- **`patches/grisu3-header.patch`**: Modifications to `grisu3.h`
10+
- **`patches/grisu3-source.patch`**: Modifications to `grisu3.c`
11+
12+
## Workflow to re-vendor grisu3
13+
14+
```r
15+
# 1. Pull fresh upstream files to maintenance/
16+
source("maintenance/01_pull-grisu3-upstream.R")
17+
```
18+
19+
This downloads `grisu3.{c,h}` from upstream to the `maintenance/` directory.
20+
21+
```bash
22+
# 2. Apply patches and copy to src/
23+
cd maintenance
24+
patch < patches/grisu3-header.patch
25+
patch < patches/grisu3-source.patch
26+
cp grisu3.h ../src/grisu3.h
27+
cp grisu3.c ../src/grisu3.c
28+
cd ..
29+
```
30+
31+
## What we modify
32+
33+
The patches transform the upstream grisu3 code for vroom's needs:
34+
35+
**Header simplifications** (`grisu3-header.patch`):
36+
- Add Jukka Jylänki copyright header
37+
- Remove MathGeoLib-specific includes (`#include "../MathBuildConfig.h"`)
38+
- Remove Emscripten support
39+
- Remove unused function declarations (`f32_to_string`, `u32_to_string`, `i32_to_string`, hex functions)
40+
- Simplify C++ string support (remove conditional compilation)
41+
42+
**Source modifications** (`grisu3-source.patch`):
43+
- Add copyright headers (Jukka Jylänki and mikkelfj)
44+
- Use `snprintf()` instead of `sprintf()` for buffer safety
45+
- Remove unused helper functions (`u32_to_string`, `i32_to_string`, `f32_to_string`, hex functions)
46+
- Add simplified `i_to_str()` function for internal use
47+
- Include mikkelfj modifications for better decimal formatting:
48+
- Handle whole numbers as integers (< 10^15)
49+
- Fix zero prefix (.1 => 0.1) for JSON export
50+
- Prefer unscientific notation for short decimals
51+
- These modifications have been here ever since grisu3.c first appeared in vroom, so Jim Hester must have found them somewhere.
52+
- Remove `#include "grisu3.h"` (not needed in vroom's structure)
53+
54+
## Why vendor grisu3?
55+
56+
vroom uses grisu3 for fast, accurate double-to-string conversion when writing CSV files (see `src/vroom_write.cc:208`). The vendored version is simpler than the full MathGeoLib implementation—we only keep what vroom needs: the `dtoa_grisu3()` function.
57+
58+
## Upstream source
59+
60+
**Repository**: https://github.com/juj/MathGeoLib
61+
**Path**: `src/Math/grisu3.{c,h}`
62+
**Current version**: See `grisu3-upstream-info.txt` for the specific commit SHA
63+
64+
The original grisu3 algorithm is from the research paper:
65+
> "Printing Floating-Point Numbers Quickly And Accurately with Integers"
66+
> by Florian Loitsch
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Pull grisu3 files from upstream MathGeoLib repository
2+
# https://github.com/juj/MathGeoLib
3+
#
4+
# After running this script, apply patches from maintenance/patches/
5+
# to get the versions vroom actually uses. See 02_apply-grisu3-patches.txt
6+
7+
library(usethis)
8+
library(gh)
9+
10+
# Upstream: https://github.com/juj/MathGeoLib/tree/master/src/Math
11+
owner <- "juj"
12+
repo <- "MathGeoLib"
13+
repo_spec <- paste0(owner, "/", repo)
14+
15+
# Get current HEAD SHA
16+
gh_result <- gh(
17+
"/repos/{owner}/{repo}/commits/{ref}",
18+
owner = owner,
19+
repo = repo,
20+
ref = "HEAD"
21+
)
22+
upstream_sha <- gh_result$sha
23+
upstream_date <- as.Date(gh_result$commit$author$date)
24+
25+
message(sprintf(
26+
"Upstream SHA: %s (%s)",
27+
substr(upstream_sha, 1, 7),
28+
upstream_date
29+
))
30+
# Upstream SHA: 55053da (2023-01-21)
31+
32+
# Download grisu3 files to maintenance/ for patching
33+
use_github_file(
34+
repo_spec = repo_spec,
35+
path = "src/Math/grisu3.h",
36+
save_as = "maintenance/grisu3.h",
37+
ref = upstream_sha
38+
)
39+
40+
use_github_file(
41+
repo_spec = repo_spec,
42+
path = "src/Math/grisu3.c",
43+
save_as = "maintenance/grisu3.c",
44+
ref = upstream_sha
45+
)
46+
47+
writeLines(
48+
c(
49+
sprintf("grisu3 vendored from: https://github.com/%s", repo_spec),
50+
sprintf("Commit: %s", upstream_sha),
51+
sprintf("Date: %s", upstream_date),
52+
sprintf(
53+
"Permalink: https://github.com/%s/tree/%s/src/Math",
54+
repo_spec,
55+
upstream_sha
56+
)
57+
),
58+
"maintenance/grisu3-upstream-info.txt"
59+
)
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
grisu3 vendored from: https://github.com/juj/MathGeoLib
2+
Commit: 55053da5e3e55a83043af7324944407b174c3724
3+
Date: 2023-01-21
4+
Permalink: https://github.com/juj/MathGeoLib/tree/55053da5e3e55a83043af7324944407b174c3724/src/Math
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
--- grisu3.h
2+
+++ grisu3.h
3+
@@ -1,3 +1,14 @@
4+
+/* Copyright Jukka Jylänki
5+
+ Licensed under the Apache License, Version 2.0 (the "License");
6+
+ you may not use this file except in compliance with the License.
7+
+ You may obtain a copy of the License at
8+
+ http://www.apache.org/licenses/LICENSE-2.0
9+
+ Unless required by applicable law or agreed to in writing, software
10+
+ distributed under the License is distributed on an "AS IS" BASIS,
11+
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
+ See the License for the specific language governing permissions and
13+
+ limitations under the License. */
14+
+
15+
/* This file is part of an implementation of the "grisu3" double to string
16+
conversion algorithm described in the research paper
17+
18+
@@ -6,18 +17,9 @@
19+
http://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf */
20+
#pragma once
21+
22+
-#include "../MathBuildConfig.h"
23+
+extern "C"
24+
+{
25+
26+
-#include <stdint.h>
27+
-
28+
-#ifdef __cplusplus
29+
-extern "C" {
30+
-#endif
31+
-
32+
-#ifdef __EMSCRIPTEN__
33+
-int js_double_to_string(double d, char *dst);
34+
-#endif
35+
-
36+
/// Converts the given double-precision floating point number to a string representation.
37+
/** For most inputs, this string representation is the
38+
shortest such, which deserialized again, returns the same bit
39+
@@ -30,50 +32,12 @@
40+
@return the number of characters written to dst, excluding the null terminator (which
41+
is always written) is returned here. */
42+
int dtoa_grisu3(double v, char *dst);
43+
-#define f64_to_string dtoa_grisu3
44+
45+
-int f32_to_string(float v, char *dst);
46+
-
47+
-/// Converts an unsigned 32-bit integer to a string. Longest 32-bit unsigned integer is
48+
-/// 4294967295, which is 10 bytes (11 if including \0)
49+
-/** @param val The number to convert.
50+
- @param dst [out] The unsigned number will be written here
51+
- as a null-terminated string. The conversion algorithm will write at most 11 bytes
52+
- to this buffer. (null terminator is included in this count).
53+
- The dst pointer may not be null.
54+
- @return the number of characters written to dst, excluding the null terminator (which
55+
- is always written) is returned here. */
56+
-int u32_to_string(uint32_t val, char *dst);
57+
-
58+
-/// Similar to u32_to_string(), but prints the number in hexadecimal, inluding leading "0x".
59+
-int u32_to_hex_string(uint32_t val, char *str);
60+
-
61+
-/// Converts an signed 32-bit integer to a string. Longest 32-bit signed integer is
62+
-/// -2147483648, which is 11 bytes (12 if including \0)
63+
-/** @param val The number to convert.
64+
- @param dst [out] The unsigned number will be written here
65+
- as a null-terminated string. The conversion algorithm will write at most 12 bytes
66+
- to this buffer. (null terminator is included in this count).
67+
- The dst pointer may not be null.
68+
- @return the number of characters written to dst, excluding the null terminator (which
69+
- is always written) is returned here. */
70+
-int i32_to_string(int i, char *dst);
71+
-
72+
-/// Similar to i32_to_string(), but prints the number in signed hexadecimal, inluding leading "0x".
73+
-int i32_to_hex_string(int i, char *str);
74+
-
75+
-#ifdef __cplusplus
76+
}
77+
-#endif
78+
79+
#ifdef __cplusplus
80+
81+
-#if defined(MATH_ENABLE_STL_SUPPORT)
82+
#include <string>
83+
-#endif
84+
+std::string dtoa_grisu3_string(double v);
85+
86+
-#if defined(MATH_ENABLE_STL_SUPPORT) || defined(MATH_CONTAINERLIB_SUPPORT)
87+
-StringT dtoa_grisu3_string(double v);
88+
#endif
89+
-
90+
-#endif

0 commit comments

Comments
 (0)