Releases: zeek/spicy
v1.11.2
Bug fixes
-
GH-1860: Fix parsing for vectors of literals.
This was broken in two ways:
- with the
(LITERAL)[]syntax, the parser would not recognize literals using type constructors - with the syntax
LITERAL[], we'd try to store the parsed value into a vector
- with the
-
GH-1847: Fix resynchronization issue with trimmed input.
When input had been trimmed,
View::advanceToNextDatacould end up returning a view starting ahead of the valid area. -
GH-1852: Fix
skipwith units.For unit parsing with
skip, we would create a temporary instance but wouldn't properly initialize it, meaning for example that parameters weren't available. We now generally fully initialize any destination, even if temporary.
v1.11.1
Bug fixes
-
GH-1831: Fix optimizer regression.
We were no longer marking types as used that are referenced through a type name.
-
GH-1823: Don't qualify magic linker symbols with C++ namespace.
We need them at their original values because that's what the runtime lbirary is hard-coded to expect.
-
Fix use of move'd from variable.
Function parameters still shadown members in C++. This is a fixup of c3abbbe.
-
Fix undefined shifts of 32bit integer in
toInt().1Uis 32bit on a 64bit system and shifting it by more than 31 bits is undefined. The following does currently produce-4294967296instead of-1:b"\xff\xff\xff\xff".to_int(spicy::ByteOrder::Big)
-
Fix
to_uint(ByteOrder)for empty byte ranges.to_uint()andto_int()for empty byte ranges throw when attempting to convert printable decimals to integers. Do the same for the byte order versions. The assumption is that it is really an error when the user callsto_int()orto_uint()on an empty byte range. -
GH-1817: Prevent null ptr dereference when looking on nodes without
Scope. -
GH-1815: Disallow expanding limited
Views again withlimit.The documented semantics of
View::limitare that it creates a new view with equal or smaller size. In contrast to that we would still have allowed to expand views with more callslimitagain as well.This patch changes the implementation of
View::limitso it can only ever make aViewsmaller.We also tweak the implementation of the check for consumed
&sizewhen used together with&eod: if the&sizewas already nested in a limited view a larger&sizevalue could previously extend the view so the&eodeffectively was ignored. Since we now do not extend theViewanymore we need to only activate the check for consumed&sizeif&eodwas not specified since in this case the user communicated that they are fine with consuming less data. -
GH-1810: Fix nested look-ahead switches.
-
Remember normalized paths when checking for duplicate files in driver.
While we ignore duplicate files it was still possible to erroneously add the same file multiple times to a compilation. Catch this trivial case.
-
GH-1462: Remember files processed by the driver.
We did this previously but stopped doing it with #1462.
-
Remove a few value copies.
-
GH-1813: Fix equality implementation of module
UID.We already computed a
uniqueIDvalue for each module to allow declaring the sameIDname multiple times; we however did not consistently use that value in the implementation ofmodule::UIDequality and hash operators which is addressed by this patch.
v1.11.0
New Functionality
-
GH-3779: Add
%sync_advancehook.This adds support for a new unit hook:
on %sync_advance(offset: uint64) { ... }This hook is called regularly during error recovery when synchronization skips over data or gaps while searching for a valid synchronization point. It can be used to check in on the synchronization to, e.g., abort further processing if it just keeps failing.
offsetis the current position inside the input stream that synchronization just skipped to.By default, "called regularly" means that it's called every 4KB of input skipped over while searching for a synchronization point. That value can be changed by setting a unit property
%sync-advance-block-size = <number of bytes>.As an additional minor tweak, this also changes the name of what used to be the
__gap__profiler to now be called__sync_advancebecause it's profiling the time spent in skipping data, not just gaps. -
Add unit method
stream()to access current input stream, and stream methodstatistics()to retrieve input statistics.This returns a struct of the following type, reflecting the input seen so far:
type StreamStatistics = struct { num_data_bytes: uint64; ## number of data bytes processed num_data_chunks: uint64; ## number of data chunks processed, excluding empty chunks num_gap_bytes: uint64; ## number of gap bytes processed num_gap_chunks: uint64; ## number of gap chunks processed, excluding empty chunks }; -
GH-1750: Add
to_realmethod tobytes. This interprets the data as representing an ASCII-encoded floating point number and converts that into areal. The data can be in either decimal or hexadecimal format. If it cannot be parsed as either, throws anInvalidValueexception. -
GH-1608: Add
get_optionalmethod to maps.
This returns anoptionalvalue either containing the map's element for the given key if that entry exists, or an unsetoptionalif it does not. -
GH-90/GH-1733: Add
resultandspicy::Errortypes to Spicy to facilitate error handling.
Changed Functionality
- The Spicy compiler has become a bit more strict and is now rejecting some ill-defined code constructs that previous versions ended up letting through. Specifically, the following cases will need updating in existing code:
- Identifiers from the (internal)
hilti::namespace are no longer accessible. Usually you can just scope them withspicy::instead. - Previous versions did not always enforce constness as it should have. In particular, function parameters could end up being mutable even when they weren't declared as
inout. Nowinoutis required for supporting any mutable operations on a parameter, so make sure to add it where needed. - When using unit parameters, the type of any
inoutparameters now must be unit itself. To pass other types into a unit so that they can be modified by the unit, use reference instead ofinout. For example, usetype Foo = unit(s: sink&)instead oftype Foo = unit(inout: sink). See https://docs.zeek.org/projects/spicy/en/latest/programming/parsing.html#unit-parameters for more.
- Identifiers from the (internal)
- The Spicy compiler new uses a more streamlined storage and access scheme to represent source code. This speeds up work up util C++ source translation (e.g., faster time to first error message during development).
spicycoptions-cand-lno longer support compiling multiple Spicy source files to C++ code individually to then build them all together. This was a rarely used feature and actually already broken in some situations. Instead, usespicyc -xto produce the C++ code for all needed Spicy source files at once.-cand-lremain available for debugging purposes.- The
spicycoption-Pnow requires a prefix argument that sets the C++ namespace, just like-x <prefix>does. This is so that the prototypes match the actual code generated by-x. To get the same identifiers as before, use an empty prefix (-P ""). - GH-1763: Restrict initialization of
constvalues to literals. This means that e.g.,constvalues cannot be initialized from otherconstvalues or function calls anymore. resultandnetworkare now keywords and cannot be used anymore as user-specified indentifiers.- GH-1661: Deprecate usage of
&convertwith&chunked. - GH-1657: Reduce data copying when passing data to the driver.
- GH-1501: Improve some error messages for runtime parse errors.
- GH-1655: Reject joint usage of filters and look-ahead.
- GH-1675: Extend runtime profiling to measure parser input volume.
- GH-1624: Enable optimizations when running
spicy-build.
Bug fixes
- GH-1759: Fix
if-condition withswitchparsing. - Fix Spicy's support for
networktype. - GH-1598: Enforce that the argument
newis either a type or a ctor. - GH-1742, GH-1760: Unroll constructors of big containers in generated code. We previously would generate code which would be expensive to compiler for some compilers. We now generate more friendly code.
- GH-1745: Fix C++ initialization of global constants through global functions.
- GH-1743: Use a checked cast for
map'sinoperator. - GH-1664: Fix
&converttyping issue with bit ranges. - GH-1724: Fix skipping in size-constrained units. We previously could skip too much data if
skipwas used in a unit with a global&size. - Fix incremental skipping. We previously would incorrectly compute the amount of data to skip which could have potentially lead to the parser consuming more data than available.
- GH-1586: Make skip productions behave like the production they are wrapping.
- GH-1711: Fix forwarding of a reference unit parameter to a non-reference parameter.
- GH-1599: Fix integer increment/decrement operators require mutable arguments.
- GH-1493: Support/fix public type aliases to units.
Documentation
- Add new section with guidelines and best practices. This focuses on performance for now, but may be extended with other areas alter. Much of the content was contributed by Corelight Labs.
- Fix documented type mapping for integers.
- Document generic operators.
v1.10.1
-
Update CI setups.
-
Fix repeated evaluations of
&parse-atexpression.
v1.9.1
-
Drop
;after#pragma. -
Update CI setups.
-
Fix repeated evaluations of
&parse-atexpression. -
Fix stray Python escape sequence.
-
Drop freebsd-12 from CI.
-
GH-1617: Fix handling of
%synchronize-*attributes for units in lists.We previously would not detect
%synchronize-ator%synchronize-fromattributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.With this patch we now handle these attributes, regardless of how the unit appears.
v1.8.4
-
Drop
;after#pragma. -
Update CI setups.
-
Fix repeated evaluations of
&parse-atexpression. -
Fix stray Python escape sequence.
-
Fix skipping of literal fields with condition.
-
Fix type of generated code for
string::size.While we defined
string's size operator to return anuint64and documented that it returns the length in codepoints, not bytes, we still generated C++ code which worked on the underlying bytes (i.e., it directly invokedstd::string::sizeinstead of usinghilti::rt::string::size).
v1.10.0
Changed Functionality
-
Numerous improvements to improve throughput of generated parsers.
For this release we have revisited the code typically generated for parsers and the runtime libraries they use with the goal of improving throughput of parsers at runtime. Coarsely summarized this work was centered around
- reduction of allocations during parsing
- reduction of data copies during parsing
- use of dedicated, hand-check implementations for automatically generated code to avoid overhead from safety checks in the runtime libraries
With these changes we see throughput improvements of some parsers in the range of 20-30%. This work consisted of numerous incremental changes, see
CHANGESfor the full list of changes. -
GH-1667: Always advance input before attempting resynchronization.
When we enter resynchronization after hitting a parse error we previously would have left the input alone, even though we know it fails to parse. We then relied fully on resynchronization to advance the input.
With this patch we always forcibly advance the input to the next non-gap position. This has no effect for synchronization on literals, but allows it to happen earlier for regular expressions.
-
GH-1659: Lift requirement that
bytesforwarded from filter be mutable. -
GH-1489: Deprecate &bit-order on bit ranges.
This had no effect and allowing it may be confusing to users. Deprecate it with the idea of eventual removal.
-
Extend location printing to include single-line ranges.
For a location of, e.g., "line 1, column 5 to 10", we now print
1:5-1:10, whereas we used to print it as only1:5, hence dropping information. -
GH-1500: Add
+=operator forstring.This allows appending to a
stringwithout having to allocate a new string. This might perform better most of the time. -
GH-1640: Implement skipping for any field with known size.
This patch adds
skipsupport for fields with&sizeattribute or of builtin type with known size. If a unit has a known size and it is specified in a&sizeattribute this also allows to skip over unit fields.
Bug fixes
-
GH-1605: Allow for unresolved types for set
inoperator. -
GH-1617: Fix handling of
%synchronize-*attributes for units in lists.We previously would not detect
%synchronize-ator%synchronize-fromattributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.We now handle these attributes, regardless of how the unit appears.
-
GH-1585: Put closing of unit sinks behind feature guard.
This code gets emitted, regardless of whether a sink was actually connected or not. Put it behind a feature guard so it does not enable the feature on its own.
-
GH-1652: Fix filters consuming too much data.
We would previously assume that a filter would consume all available data. This only holds if the filter is attached to a top-level unit, but in general not if some sub-unit uses a filter. With this patch we explicitly compute how much data is consumed.
-
GH-1668: Fix incorrect data consumption for
&max-size.We would previously handle
&sizeand&max-sizealmost identical with the only difference that&max-sizesets up a slightly larger view to accommodate a sentinel. In particular, we also used identical code to set up the position where parsing should resume after such a field.This was incorrect as it is in general impossible to tell where parsing continues after a field with
&max-sizesince it does not signify a fixed view like&size. We now compute the next position for a&max-sizefield by inspecting the limited view to detect how much data was extracted. -
GH-1522: Drop overzealous validator.
A validator was intended to reject a pattern of incorrect parsing of vectors, but instead ending up rejecting all vector parsing if the vector elements itself produced vectors. We dropped this validation.
-
GH-1632: Fix regex processing using
{n,m}repeat syntax being off by one -
GH-1648: Provide meaningful unit
__beginvalue when parsing starts.We previously would not provide
__beginwhen starting the initial parse. This meant that e.g.,offset()was not usable if nothing ever got parsed.We now provide a meaningful value.
-
Fix skipping of literal fields with condition.
-
GH-1645: Fix
&sizecheck.The current parsing offset could legitimately end up just beyond the
&sizeamount. -
GH-1634: Fix infinite loop in regular expression parsing.
Documentation
-
Update documentation of
offset(). -
Fix docs namespace for symbols from
filtermodule.We previously would document these symbols to be in
spicyeven though they are infilter. -
Add bitfield examples.
v1.8.3
-
GH-1645: Fix
&sizecheck.The current parsing offset could legitimately end up just beyond the
&sizeamount. -
GH-1617: Fix handling of
%synchronize-*attributes for units in lists.We previously would not detect
%synchronize-ator%synchronize-fromattributes if the unit was not directly in a field, i.e., we mishandled the common case of synchronizing on a unit in a list.With this patch we now handle these attributes, regardless of how the unit appears.
v1.9.0
New Functionality
-
GH-1468: Allow to directly access members of anonymous bitfields.
We now automatically map fields of anonymous bitfields into their containing unit.
type Foo = unit { : bitfield(8) { x: 0..3; y: 4..7; }; on %done { print self.x, self.y; } };
-
GH-1467: Support bitfield constants in Spicy for parsing.
One can now define bitfield "constants" for parsing by providing integer expressions with fields:
type Foo = unit { x: bitfield(8) { a: 0..3 = 2; b: 4..7; c: 7 = 1; };
This will first parse the bitfield as usual and then enforce that the two bit ranges that are coming with expressions (i.e.,
aandc) indeed containing the expected values. If they don't, that's a parse error.We also support using such bitfield constants for look-ahead parsing:
type Foo = unit { x: uint8[]; y: bitfield(8) { a: 0..3 = 4; b: 4..7; }; };
This will parse uint8s until a value is discovered that has its bits set as defined by the bitfield constant.
(We use the term "constant" loosely here: only the bits with values are actually enforced to be constant, all others are parsed as usual.)
-
GH-1089, GH-1421: Make
offset()independent of random access functionality.We now store the value returned by offset() directly in the unit instead of computing it on the fly when requested from cur - begin. With that offset() can be used without enabling random access functionality on the unit.
-
Add support for passing arbitrary C++ compiler flags.
This adds a magic environment variable HILTI_CXX_FLAGS which if set specifies compiler flags which should be passed during C++ compilation after implicit flags. This could be used to e.g., set defines, or set low-level compiler flags.
Even with this flag, for passing include directories one should still use
HILTI_CXX_INCLUDE_DIRSsince they are searched before any implicitly added paths. -
GH-1435: Add bitwise operators
&,|, and^for booleans. -
GH-1465: Support skipping explicit
%donein external hooks.Assuming
Foo::Xis a unit type, these two are now equivalent:on Foo::X::%done { } on Foo::X { }
Changed Functionality
-
GH-1567: Speed up runtime calls to start profilers.
-
GH-1565: Disable capturing backtraces with HILTI exceptions in non-debug builds.
-
GH-1343: Include condition in
&requiresfailure message. -
GH-1466: Reject uses of
selfin unit&sizeand&max-sizeattribute.Values in
selfare only available after parsing has started while&sizeand&max-sizeare consumed before that. This means that any use ofselfand its members in these contexts would only ever see unset members, so it should not be the intended use. -
GH-1485: Add validator rejecting unsupported multiple uses of attributes.
-
GH-1465: Produce better error message when hooks are used on a unit field.
-
GH-1503: Handle anonymous bitfields inside
switchstatements.We now map items of anonymous bitfields inside a
switchcases into the unit namespace, just like we already do for top-level fields. We also catch if two anonymous bitfields inside those cases carry the same name, which would make accesses ambiguous.So the following works now:
switch (self.n) { 0 -> : bitfield(8) { A: 0..7; }; * -> : bitfield(8) { B: 0..7; }; };
Whereas this does not work:
switch (self.n) { 0 -> : bitfield(8) { A: 0..7; }; * -> : bitfield(8) { A: 0..7; }; };
-
GH-1571: Remove trimming inside individual chunks.
Trimming a
Chunk(always from the left) causes a lot of internal work with only limited benefit since we manage visibility with astream::Viewon top of aChunkanyway.We now trimming only removes a
Chunkfrom aChain, but does not internally change individual theChunkanymore. This should benefit performance but might lead to slightly increased memory use, but callers usually have that data in memory anyway. -
Use
find_package(Python)with version.Zeek's configure sets
Python_EXECUTABLEhas hint, but Spicy is usingfind_package(Python3)and would only usePython3_EXECUTABLEas hint. This results in Spicy finding a different (the default) Python executable when configuring Zeek with--with-python=/opt/custom/bin/python3.Switch Spicy over to use
find_package(Python)and add the minimum version so it knows to look forPython3.
Bug fixes
- GH-1520: Fix handling of
spicy-dump --enable-print. - Fix spicy-build to correctly infer library directory.
- GH-1446: Initialize generated struct members in constructor body.
- GH-1464: Add special handling for potential
advancefailure in trial mode. - GH-1275: Add missing lowering of Spicy unit ctor to HILTI struct ctor.
- Fix rendering in validation of
%byte-orderattribute. - GH-1384: Fix stringification of
DecodeErrorStrategy. - Fix handling of
--show-backtracesflag. - GH-1032: Allow using using bitfields with type declarations.
- GH-1484: Fix using of
&converton bitfields. - GH-1508: Fix returned value for
<unit>.position(). - GH-1504: Use user-inaccessible chars for encoding
::in feature variables. - GH-1550: Replace recursive deletion with explicit loop to avoid stack overflow.
- GH-1549: Add feature guards to accesses of a unit's
__position.
Documentation
- Move Zeek-specific documentation into Zeek documentation.
- Clarify error handling docs.
- Mention unit switch statements in conditional parsing docs.
v1.8.2
-
GH-1571: Remove trimming inside individual chunks.
Trimming
Chunks (always from the left) causes a lot of internal work with only limited benefit since we manage visibility withstream::Views on top ofChunks anyway.This patch removes trimming inside
Chunks so now any trimming only removesChunks fromChains, but does not internally change individualChunks anymore. This might lead to slightly increased memory use, but callers usually have that data in memory anyway. -
GH-1549: GH-1554: Fix potential infinite loop when trimming data before stream.
Previously we would trigger an infinite loop if one tried to trim before the head chunk of a stream. In praxis this seem to have been no issue due to #1549 and us emitting way less calls to trim than possible.
This patch adds an explicit check whether we need to trim anything, and exits the low-level function early for such cases.
-
GH-1550: Replace recursive deletion with explicit loop to avoid stack overflow.
-
GH-1549: Add feature guards to accesses of a unit's
__position.Access of
__positiontriggers a random access functionality. In order to distinguish our internal uses from accesses due to user code, most access in our generated code should be guarded with a feature constant (ifor ternary).In this patch add proper guards for a couple instances where we did not do that correctly. That mishap caused all units with containers to be random access (even the root unit) which in turn could have lead to e.g., unbounded memory growth, or runtime overhead due to generation and execution of unneeded code, or expensive cleanup on very large untrimmed inputs.
-
Artificially limit the number of open files.
This works around a silent failure in reproc where it would refuse to run on systems which huge rlimits for the number of open files. We have seen this hit on huge production boxes.
-
Add begin to parser state.
This patch adds the current begin position to the parser state, and makes the corresponding changes to generated parser functions so it is passed down.
We already modelled the semantic beginning of the input in the unit, but had no reliable way to keep this up-to-date across non-unit contexts like
&parse-from. This would then for certain setups lead to generated code whereinputandpositionwould point to different inputs which in turn causedoffset(modelled asposition - input) to be incorrect. -
Expand validator error message.
-
Disable a few newer clang-tidy categories.
The options disabled here and triggered in newer versions of clang-tidy.
-
Drop
-noall_loadlinker option.We added this linker option on macos. This option was already obsolete, e.g., in the
ldmanpage:-noall_load This is the default. This option is obsolete.Newer versions of xcode do not know this option anymore and instead generate a hard error.
-
Declare Spicy pygments extension as parallel-safe.
We previously would not declare that the Spicy pygments highlighter is safe to execute in parallel (reading or writing of sources). Sphinx then assumed that the extension was not safe to run in parallel and instead ran jobs sequentially.
This patch declares the extension as able to execute in parallel. Since the extension does not manage any external state this is safe.
-
Use
find_package(Python)with version.Zeek's configure sets
Python_EXECUTABLEhas hint, but Spicy is usingfind_package(Python3)and would only usePython3_EXECUTABLEas hint. This results in Spicy finding a different (the default) Python executable when configuring Zeek with--with-python=/opt/custom/bin/python3.Switch Spicy over to use find_package(Python) and add the minimum version so it knows to look for Python3.