Vectorize CSV parsing by fernya · Pull Request #1056 · axoflow/axosyslog

fernya · 2026-05-04T12:09:26Z

This PR adds vectorization to some CSV parsing cases.

You may opt-out with the usage of -DENABLE_MANUAL_SIMD=Off (CMake) or --enable-manual-simd=no (autotools) flags.
The build system then checks the available and implemented SIMD instructions sets and compiles the appropriate code.

At runtime the SIMD blocks may be disabled if the CPU doesn't support them.
You may test this with the following command: EXTRA_ARGS="--run-under=qemu --qemu-cpu=qemu64 -k panos" make light-check
(qemu-user package needs to be installed on Ubuntu)

Remove have_simd branching from hot path
Move all #ifdefs out from csv-scanner.c

Add csv-scanner-simd.{c,h} as a self-contained library of AVX2 helpers that the CSV scanner will use: * csv_simd_parse() — full record pre-parse for the simple comma-delimited / RFC4180 case, producing an array of (start, end, quoted) field offsets in one pass over the input. * csv_simd_find_either{,3,4}() — first-match search for up to four single-byte needles, with vector-domain test-zero short-circuit on blank chunks and per-needle bitmaps materialized only on hit. * csv_simd_trim_leading_whitespace() / csv_simd_trim_trailing_whitespace() — 32-byte-at-a-time SIMD whitespace trim on offset ranges. The classifier broadcasts are spelled out as _mm256_set1_epi8 literals inside csv_simd_classify_min_64(), which is static inline; with AVX2 and inlining the compiler folds them into RIP-relative loads and hoists them out of the chunk loop. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

… input_end The hot inner-loop scans in _parse_characters_with_quotation and _parse_value_with_whitespace_and_delimiter used strchrnul()-pair + MIN() to find the first interesting byte (escape/quote/delimiter). That walks the haystack one byte at a time per needle, twice or thrice over the same memory. Replace with csv_simd_find_either() / csv_simd_find_either3() — single pass, 64 bytes at a time, vector-domain test-zero short-circuit on blank chunks. Drop-in semantic equivalent for the strchrnul case. Also cache input_end (= input + strlen(input)) on the scanner at init time and use input_end - src instead of strlen(self->src) per call. strlen on a long syslog record is O(N) and we were paying it for every field. Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

…st-path records Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

- Conditionally compile SIMD types which are available only on selected platforms. The code shall compile on non-SIMD capable toolchains. The conditions are provided externally (ie. CMake) - Implement and guard SIMD-aware implementations of parser functions - Determine SIMD capabilities runtime and gating the fast path Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

mitzkia · 2026-05-07T12:11:04Z

Thanks for the Light related update. I can approve that.

We may test the fallback path when the code has compiled-in manual SIMD optimization, but at runtime the CPU doesn't support it. Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

MrAnno

This PR is not completely ready for review, as it contains a huge amount of probably LLM-generated noise, confusion, and dead code. Let's make some cleanup first.

My first review round is about this cleanup, I'll do a separate review round which will be much more meaningful and I'll concentrate on the actual implementation and perf aspects there.

Sorry for the short and dry comments. They may sound harsh, but I just wanted to get through the LLM-generated stuff as fast as possible and concentrate on the important things afterwards (where I'll be much more thorough and polite, of course).

MrAnno · 2026-05-11T09:36:21Z

+#include "csv-scanner-simd.h"
+#include "compat/glib.h"
+
+/* Per-function target attribute lets GCC and clang emit AVX2 code */


The best comment is the one that is not needed. Clean, self-explanatory code can spare a lot of comments; which has the advantage of not requiring constant updates as the code changes (it won't mislead the reader once it goes stale).

"Comments should not duplicate the code" - based on this "rule", I'll mark the comments which I found unnecessary with *c.

MrAnno · 2026-05-11T09:41:38Z

+
+/* Parser-level chunk: two AVX2 ymm loads (32 B each) → one 64-bit hit
+ * bitmap that we iterate with __builtin_ctzll.  Not a SIMD-imposed
+ * constant; the 64 falls out of "two movemasks fit in a uint64_t". */


*c

I don't really understand what this means. Looks like a generated comment, with a unicode → symbol. I'm sure we can explain this more clearly with code or with a simpler comment near to the code that uses this constant.

I'd be less confused without this comment.

MrAnno · 2026-05-11T09:48:42Z

+
+/* Per-chunk hit bitmaps: bit i is set iff chunk[i] matches the class.
+ * `quotes` collapses '"' and '\'' into a single mask; `spaces` does the
+ * same for ' ' and '\t'. */


I think the non-redundant valuable information here would be stating that these are bitmaps for finding the important characters within a given chunk.

MrAnno · 2026-05-11T09:49:51Z

+  __m256i lo = _mm256_loadu_si256((const __m256i *) chunk);
+  __m256i hi = _mm256_loadu_si256((const __m256i *) (chunk + SIMD_PARSER_CHUNK_BYTES / 2));
+
+  /* commas: single character */


*c

Move the declaration of comma next to this call instead.

MrAnno · 2026-05-11T09:50:26Z

+  self->commas = _combine_mm(_mm256_cmpeq_epi8(lo, comma),
+                             _mm256_cmpeq_epi8(hi, comma));
+
+  /* quotes: '"' | '\''  - fuse the OR in vector domain, movemask once per lane */


*c

Move the declaration of (s)quote next to this call instead.

MrAnno · 2026-05-11T10:39:56Z

+
+AVX2_FN
+void
+csv_simd_find_delim_quote_space(const char *start, gsize max_len, gchar delimiter, gchar quote_char,


MrAnno · 2026-05-11T10:40:15Z

+
+AVX2_FN
+gint
+csv_simd_find_delimiter_or_quote(const char *start, gsize max_len, gchar delimiter, gchar quote_char)


MrAnno · 2026-05-11T10:45:38Z

+  if (result.offset >= 0)
+    nexthop = self->src + result.offset;
+  else
+    nexthop = self->src + remaining;  /* no match: scan to end of input */


This breaks compatibility on platforms where SIMD operations are not available.

MrAnno · 2026-05-11T11:05:10Z

+  return -1;
+}
+
+#else


Optional:

I think it's possible to refactor the SIMD code's entry points so that we use only 3-4 short calls that either perform the optimization or signal an error, letting the scanner fall back to the slow path when calling into the SIMD methods from the CSV scanner side. That way, we would no longer need some many NOOP functions, and bugs like the one I found in the previous comment would be easier to catch.

For example:

Extracting the SIMD-related logic from _parse_value_with_whitespace_and_delimiter() into a separate function and moving it to csv-scanner-simd.c.

_scan_next_from_fast_path, _trim_fast_path_fields_whitespace, and _can_use_simd_fast_path should also go into csv-scanner-simd.c.

The only code remaining inside csv-scanner.c would be the check for SIMD availability and short one-line calls into the SIMD-optimized paths. Passing CSVScanner *self to these functions is fine. If we want to separate the generic SIMD code that doesn't operate on CSVScanner, we can add a separate simd.c unit, which can be built conditionally as a whole.

MrAnno · 2026-05-11T11:20:37Z

+
+#ifdef SYSLOG_NG_MANUAL_NEON
+
+#include ???


#error "NEON instructions are currently not supported"

Or even better, we should get rid of all SYSLOG_NG_MANUAL_NEON branches in this PR so that we can compile and use the slow path until we have an actual working implementation.

A TODO comment should be enough.

MrAnno · 2026-05-11T11:39:38Z

+
+  /* SIMD fast path for simple unquoted values with default delimiter */
+  if (csv_is_simd_available() && !self->current_quote && !self->options->string_delimiters && !self->options->delimiters &&
+      self->options->dialect == CSV_SCANNER_ESCAPE_NONE && !(self->options->flags & CSV_SCANNER_STRIP_WHITESPACE))


This can be hidden inside a csv_scanner_can_use_simd_fast_path() or anything similar.

MrAnno · 2026-05-11T11:53:45Z

+          /* Fall through to handle quote */
+          self->current_quote = *self->src;
+          self->src++;
+          goto continuation;


_parse_unquoted_literal_characters() detects escaping on DEFAULT_DELIM_CHAR, which seems to be missing from here.

What happens with foo"bar,baz? (Klaudia found this one)
The old code didn't allow starting a quote in the middle of the value, the new one does, so it can mess things up.

MrAnno · 2026-05-11T11:59:05Z

+      gsize remaining = self->input_end - self->src;
+      CSVSimdFindResult result;
+      /* Find delimiter, double-quote, or single-quote in a single SIMD pass */
+      csv_simd_find_either3(self->src, remaining, DEFAULT_DELIM_CHAR, '"', '\'', &result);


quotes_start and quotes_end are configurable. Shouldn't we add this as a condition to running the fast path?

MrAnno · 2026-05-11T12:03:14Z

+      else if (result.offset == 0)
+        {
+          /* Delimiter at start */
+          self->src++;


We stop at quotes, not just delimiters. I think we shouldn't skip this character unconditionally.

MrAnno · 2026-05-11T12:15:50Z

+_find_n(const char *start, gsize max_len, const gchar *chars, int n,
+        CSVSimdFindResult *result)
+{
+  if (G_LIKELY(cpu_supports_simd))


I think the runtime SIMD detection should be done only once outside of this unit and we should fall back to the original CSV scanner implementation, not to this partial fallback (_scalar_search_impl).

To summarize it:
We shouldn't see multiple cpu_supports_simd and csv_is_simd_available() checks, we should disable the whole thing in the scanner.

MrAnno · 2026-05-11T12:27:45Z

+
+  /* Pre-allocate fast_path_fields for at least 16 columns to avoid early reallocations */
+  scanner->fast_path_fields = g_array_sized_new(FALSE, FALSE, sizeof(CSVFieldOfs), 16);
+  csv_detect_cpu_features();


Optional: You started caching the CPU feature check, which is a good idea.
This means we could move the detection call into a csv_scanner_global_init() function, which you could call from app_startup(). That runs only once, csv_scanner_init() is called whenever we want to scan a new CSV string.

MrAnno · 2026-05-11T12:30:04Z

+
+AVX2_FN
+void
+csv_simd_find_either(const char *start, gsize max_len, gchar c1, gchar c2, CSVSimdFindResult *result)


if max_len is gsize, then all offsets returned should also be gsize.

MrAnno · 2026-05-11T12:31:41Z

+ * offsets; subsequent scan_next calls dequeue from that array.  Bails out
+ * to the generic path on cases the classifier's view of quoting can't
+ * express (unterminated quoted field, literal content after closing quote
+ * like "foo"bar). */


This is not true.

fernya requested review from MrAnno, OverOrion, alltilla and sodomelle May 4, 2026 12:09

fernya force-pushed the simd_aggressive_optimization branch 2 times, most recently from 3bb7da4 to 2b0e405 Compare May 6, 2026 12:53

OverOrion and others added 9 commits May 6, 2026 15:07

csv-scanner: add SIMD fast-path for simple-options records

e265420

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

filterx: add filterx_list_sized_new for size-hinted list construction

c0d8501

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

filterx parse_csv: pre-size output list and use slice path on SIMD fa…

0a22c03

…st-path records Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

csv-scanner: Export MANUAL_AVX2 symbol if compiler supports avx2

4c91b03

Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

csv-scanner: Add AVX compiler flag detection to automake build system

8ace368

Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

cmake: fix AVX2 detection on macOS

5d38dd3

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Szilard Parrag <szilard.parrag@axoflow.com>

OverOrion force-pushed the simd_aggressive_optimization branch from 2b0e405 to 5d38dd3 Compare May 6, 2026 13:09

csv-parser: Fix input parsing

e082452

Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

fernya force-pushed the simd_aggressive_optimization branch 3 times, most recently from ce85180 to e082452 Compare May 6, 2026 19:29

fernya marked this pull request as ready for review May 7, 2026 05:47

fernya requested a review from mitzkia May 7, 2026 10:48

fernya force-pushed the simd_aggressive_optimization branch from a34886a to d5d1dee Compare May 7, 2026 12:01

csv-parser: Run light tests with qemu virtual cpu

b9c8b06

We may test the fallback path when the code has compiled-in manual SIMD optimization, but at runtime the CPU doesn't support it. Signed-off-by: Ferencz, Balint <balint.ferencz@axoflow.com>

fernya force-pushed the simd_aggressive_optimization branch from d5d1dee to b9c8b06 Compare May 7, 2026 12:12

MrAnno reviewed May 11, 2026

View reviewed changes

OverOrion marked this pull request as draft May 12, 2026 08:38


		#ifdef SYSLOG_NG_MANUAL_NEON

		#include ???

Conversation

fernya commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mitzkia commented May 7, 2026

Uh oh!

MrAnno left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fernya commented May 4, 2026 •

edited

Loading