Releases: zeek/spicy
v1.15.0
New Functionality
-
Control-flow-based optimizations enabled and expanded
We built out the control-flow-based optimization first introduced as an experimental feature in spicy-1.14, and with this release these optimizer passes are enabled by default. For that we made the framework more efficient and cleaned up issues in the implementation.
We also introduced a new pass performing constant propagation (GH-2137, GH-2150) and reducing nested scopes to simplify data flow analysis (GH-1425).
-
GH-2197: Support coercion from empty list to struct
Spicy
structfields can already be declared with a&defaultvalue to reduce data duplication. We now allow constructingstructvalues from empty lists[]which effectively allows default-construction.
Changed Functionality
-
GH-2183: Use dedicated C++ types for representing
optionalandtuplein the runtime library -
GH-2194: Reimplement
hilti::rt::currentExecutablewithout external dependency -
GH-2201:
HILTI_OPTIMIZER_PASSEShas been removed -
GH-2204: Minimum required GCC version bumped to gcc-12
-
GH-2222: Properly resolve and validate capture groups
Previously, incorrect uses of regular expression capture groups (e.g.,
$1,$2) were emitted to C++ without any analysis, causing compilation failures or unexpected behavior. Now, we validate capture groups and reject invalid Spicy code that was previously accepted. -
GH-2245: Extend our notion of reserved C++ identifiers
When generating C++ code from Spicy sources, we transform Spicy identifiers which are not valid in C++. We extended the set of identifiers we transform, which might be visible when directly working with the generated C++ code, like e.g., in custom host applications.
Bug fixes
- GH-2144: Function parameter and local variable name clash leads to C++ error
- GH-2154: Coercion from Null gets unhandled internal error
- GH-2162: Fix ASAN false positive on ARM
- GH-2175: Fix tuple assignment when coercing individual elements
- GH-2177: Fix C++ code generation for struct parameters passed as references
- GH-2184: FunctionParamVisitor fails to remove some redundant parameters
- GH-2205: Uglify internal identifiers
- Multiple fixes for issues reported by static analysis tools (
clang-tidy, Coverity) - zeek/zeek#5008: Do not normalize ID names inside type information
Documentation
- GH-2218: Add FAQ for code optimizers can remove
v1.14.0
New Functionality
-
GH-2028: New interprocedural optimizations.
We added infrastructure for performing interprocedural optimizations, and as a first user added a pass which removes unused function parameters in GH-2030. While this works on any code it is mainly intended to simply generated parser code for better runtime performance.
-
GH-1697: Remove some dead statements based on control and data flow.
We now collect control and data flow information. We use this to detect and remove "dead statements", i.e., statements which are not seen by any other needed computations. Currently we handle two classes of dead statements:
- assignments which are override before being used
- unreachable code, e.g., due to preceding
return,breakorthrow
The implementation for this is still not able to cover all possible Spicy language constructs, so it is behind a feature flag and not enabled by default. To enable it one needs to set the environment variable
HILTI_OPTIMIZER_ENABLE_CFG=1when compiling Spicy code with e.g.,spicyc.We encourage users to test this compilation mode and if possible use the compiled parsers in production. If parsers compiled this way show the intended runtime behavior in tests they should also be fine to use in production.
Changed Functionality
-
GH-2050: Prefer stdout over stderr for
--helpmessages.Spicy tools now emit
--helpoutput to stdout instead ofstderr. -
GH-2068: Allow disabling building of tests.
We added a new CMake option
SPICY_ENABLE_TESTSwhich if toggled on forces building of test and benchmark binaries; it isONby default. This flag can be used by projects building Spicy to disable building of tests if they are not interested in them. We also provide a configure flag--disable-testswhich has has the effect of turning it off. -
GH-1663: Speed up checking of iterator compatibility.
We were previously using a control block which held a
weak_ptrto the protected data. This was pretty inefficient for a number of reasons:- access to the controlled data always required a
weak_ptr::lockwhich created a temporaryshared_ptrcopy and immediately destroyed it after access - to check whether the control block was expired we used
lockinstead ofexpiredwhich introduced the same overhead - to check compatibility of iterators we compared
shared_ptrsto the control data which again required full locks instead of usingowner_before
This manifested in e.g., loops often being less performant than possible. We now changed how we hold data to make iterating collections cheaper.
- access to the controlled data always required a
-
GH-2086: Fix scope resolution of local variables.
If usage of a local comes before its declaration, we now no longer resolve that usage to this local. It'll either be resolved to an upper layer ID (if there is one of the same name), or rejected if it's otherwise unknown.
-
GH-2066: When C++ compilation fails, ask user for help.
We do expect C++ code generated by Spicy to be valid, so C++ compiler errors in generated code are likely bugs. We now record the output of the C++ compiler in a dedicated file
hilti-jit-error.logand ask users to file a ticket in case C++ compilation failed. -
GH-1660: When printing anonymous bitfields inside a struct, lift up the fields.
This now prints, e.g.,
[$fin=1, $rsv=0, $opcode=2, $remaining=255]instead of[$<anon>=(1, 0, 2, 255)].In addition, we also prettify non-anonymous bitfields. They now print as, e.g.,
[$y=(a: 4, b: 8)]instead of[$y=(4, 8)]. -
GH-1085: Allow registering a module twice.
So far, if one compiled the same HILTI module twice, each into its own HLTO, then when loading the two HLTOs, the runtime system would skip the second instance. However, that's not really what we want: a module could intentionally be part of multiple HLTOs, in which case each should get its own copy of that module state (i.e., its globals).
This change allows the same module to be registered multiple times, with the HLTO linker scope distinguishing between the instances at runtime, as usual. To make that work, we move computation of the scope from compile time to runtime, using the library's absolute path as the scope.
-
GH-1905: Fix operator precedence in Spicy grammar.
We fixed the precedence of a number of operators to be closer to what users would expect from other language like C++ or Python.
- we reduced the precedence of the
inoperator - pre- and postfix operators
++and--now have same precedence and are right associative - unary negate was change to match the precedence of other unary operators.
- we reduced the precedence of the
-
Switch compilation to C++20.
Like Zeek Spicy now requires a C++ compiler. As part of this change we cleaned up the implementation to take advantage of C++ functionality in a number of places. We also moved from the external libraries
linb::anytostd::any, andghc::filesystemtostd::filesystem. -
Update supported platforms.
We dropped support for the following platforms:
- debian-11
- fedora-40
We added support for
- debian-13
- fedora-42
-
GH-1660: Render all bitfield instances with included field names.
-
GH-2099: Fully implement iterator interface for
set::Iterator. -
GH-2052: Move calling convention from function to function type.
Bug fixes
- GH-2057: Fix
bytesiterator dereference operation. - GH-2065: Error for redefined locals from statement inits.
- GH-2061: Fix cyclic usage of units types inside other types.
- GH-2074: Fix fiber abortion.
- GH-2063: Fix C++ compilation issue with weak->strong refs.
- GH-2064: Ensure generated typeinfos are declared before used.
- GH-2044: Catch if methods are implemented multiple times.
- GH-2078: Fix C++ output for constants of constant type.
- GH-1988: Enforce that block-local declarations must be variables.
- GH-1996: Catch exceptions in
processInputgracefully. - GH-2091: Fix strong->value reference coercion in calls.
- GH-2100: Add missing deref operations for struct try-member/has-member operators.
- GH-2119: Fix missing
inlinefunctions in enum prototypes. - GH-2142, GH-2134: Complete information exposed for reflection in typeinfo.
- GH-2135: Add
&cxx-any-as-ptrattribute.
Documentation
- GH-1905: Document operator precedence.
v1.11.6
Bug fixes
-
GH-2074: Fix fiber abortion.
When aborting a fiber, we need to activate it once more, to then leave it for good by raising an
AbortException. Problem was that that exception ended up being caught by user code because it was derived fromstd::exception`. This change removes the base class so that the exception is guaranteed to go back to the managing fiber code, where we just ignore it. -
GH-2073: Prevent throwing naked exception when yielding from aborted fiber.
v1.13.2
Bug fixes
-
GH-2119: Fix missing inline functions in enum prototypes.
Our prototype generation could miss function bodies for
inlinefunctions. -
GH-2074: Fix fiber abortion.
When aborting a fiber, we need to activate it once more, to then leave it for good by raising an
AbortException. Problem was that that exception ended up being caught by user code because it was derived fromstd::exception`. This change removes the base class so that the exception is guaranteed to go back to the managing fiber code, where we just ignore it.
v1.13.1
v1.11.5
v1.13.0
New Functionality
-
GH-1788: We now support decoding and encoding to UTF16, in particular
the newUTF16LEandUTF16BEcharsets for little and big endian
encoding, respectively. -
GH-1961: We now support creating type values in Spicy code. The
primary use case for this is to pass type information to host
applications, and debugging.A type value is typically created from either
typeinfo(TYPE)or
typinfo(value), or coercion from an existing ID of a custom type
likeglobal T: type = MyStruct;. The resulting value can be printed,
or stored in a variable of typetype, e.g.,
global bool_t: type = typeinfo(bool);. -
GH-1971: Extend unit
switchbased on look-ahead to support blocks of
items.In 1.12.0 we added support grouping related unit fields in blocks;
there the primary use case wereifblocks to group fields with
identical dependencies. We now also support such blocks inside unit
switchconstructs with lookahead so one can write the following
code:# Parses either `a` followed by another `a`, or `b`. type X = unit { switch { -> { : b"a"; : b"a"; } -> : b"b"; }; };
-
GH-1538: Implement compound statements (
{...}). This allows
introducing local scopes, e.g., to group related code. -
GH-1946:
string'sencodemethod gained an optionalerrors
argument to influence error handling. The parameter defaults to
DecodeErrorStrategy::REPLACEreproducing the previous implicit
behavior. -
GH-2010:
bytesandstringgainedends_withmethods -
GH-1965: Add support for case-insensitive matching to regular
expressions.By adding an
iflag to a regular expression pattern, it will now be
matched case-insensitively (e.g./foobar/i). -
GH-1962: Add
spicy-dumpoption to enable profiling.
Changed Functionality
-
GH-1981, GH-1982, GH-1991: We now catch more user errors in defining
function overloads. Previously these would likely (hopefully) have
failed in C++ compilation down the line, but are now cleanly rejected. -
GH-1977: We now reject function overloads which only differ in their
return type. -
GH-1991: We now reject function prototypes without
&cxxname.Since in Spicy global declarations can be in any order there is no
need to introduce a function with a prototype if it is declared later.
The only valid use case for function prototypes was if the function
was implemented in C++ and bound to the Spicy name with&cxxname. -
We have cleaned up our implementation for runtime type information,
primarily intended for custom host applications.type_info::Valueinstances obtained through runtime type
introspection can now be rendered to a user-facing representation
with a newto_stringmethod.- The runtime representation was changed to correctly encode that
tuple elements can remain unset. A Spicy-side tuple
tuple<T1, T2, T3>now gets turned into
std::tuple<std::optional<T1>, std::optional<T2>, std::optional<T3>>
which captures the full semantics. - We added type information for types previously not exposed, namely
Null,NothingandList. We also fixed the exposed type
information forresult<void>.
-
GH-2011: We have optimized allocations for unit fields extracting
vectors which should speed up extracting especially small and
medium-size vectors. -
GH-2035: We have dropped support for Ubuntu 20.04 (Focal Fossa) since
it has reached end of standard support upstream. -
GH-2026: Speed up matching of character classes in regexps
Bug fixes
- GH-1580: Catch when functions aren't called.
- GH-1961: Fix generated C++ prototype header.
- GH-1966: Reject anonymous units in variables and fields.
- GH-1967: Fix inactive stack size check during module initialization.
- GH-1968: Fix coercion of function call arguments.
- GH-1976: Fix unit
&max-sizenot returning to proper loc. - GH-2007: Fix using
&trywith&max-size, and potentially other
cases. - GH-2016: Fix
&sizeexpressions evaluating multiple times. - GH-2038: Prevent escape of non-HILTI exception in lower-level driver
functions. - GH-2047: Make sure
bytes::to[U]Intreturns runtime integers. - GH-2049: Add
#include <cstdint>for fixed-width integers
Documentation
- GH-1155: Document iteration over maps/set/vectors.
- GH-1963: Document
assert-exception. - GH-1964: Document use of
$$inside&{while,until,until-including}. - GH-1973: Remove documentation of unsupported
&nosub. - GH-1974: Add documentation on how to interpret stack traces involving
fibers. - GH-1975: Fix possibly-incorrect custom host compile command
- GH-2039: Touchup docs style section.
- GH-1970, GH-2003: Fix minor typos in documentation.
v1.11.4
Bug fixes
- GH-2047: Make sure
bytes::to[U]Intreturns runtime integers. - GH-2049: Fix building with GCC15.
- GH-1999, GH-2004: Adjust build setup for cmake-4.
- GH-2038: Prevent escape of non-HILTI exception in lower-level driver functions.
- GH-1918: Fix potential segfault with stream iterators.
- GH-1871: Fix
&max-sizeon unit containing aswitch.
v1.12.0
New Functionality
-
We now support
ifaround a block of unit items:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; }; };One can also add an
else-block:type X = unit { x: uint8; if ( self.x == 1 ) { a1: bytes &size=2; a2: bytes &size=2; } else { b1: bytes &size=2; b2: bytes &size=2; }; }; -
We now support attaching an
%errorhandler to an individual field:type Test = unit { a: b"A"; b: b"B" %error { print "field B %error", self; } c: b"C"; };With input
AxC, that handler will trigger, whereas withABxit won't. If the unit had a unit-wide%errorhandler as well, that one would trigger in both cases (i.e., forb, in addition to its field local handler).The handler can also be provided separately from the field:
on b %error { ... }In that separate version, one can receive the error message as well by declaring a corresponding string parameter:
on b(msg: string) %error { ... }This works externally, from outside the unit, as well:
on Test::b(msg: string) %error { ... } -
GH-1856: We added support for specifying a dedicated error message for
requiresfailures.This now allows creating custom error messages when a
&requirecondition fails. Example:type Foo = unit { x: uint8 &requires=($$ == 1 : error"Deep trouble!'"); # or, shorter: y: uint8 &requires=($$ == 1 : "Deep trouble!'"); };This is powered by a new condition test expression
COND : ERROR. -
We reworked C++ code generation so now many parsers should compile faster. This is accomplished by both improved dependency tracking when emitting C++ code for a module as well as by a couple of new peephole optimization passes which additionally reduced the emitted code.
Changed Functionality
- Add
CMAKE_CXX_FLAGStoHILTI_CONFIG_RUNTIME_LD_FLAGS. - Speed up compilation of many parsers by streamlining generated C++ code.
- Add
starts_withsplit,split1,loweranduppermethods tostring. - GH-1874: Add new library function
spicy::bytes_to_mac. - Optimize
spicy::bytes_to_hexstringandspicy::bytes_to_mac. - Improve validation of attributes so incompatible or invalid attributes should be rejected more reliably.
- Optimize parsing for
bytesof fixed size as well as literals. - Add a couple of peephole optimizations to reduce emitted C++ code.
- GH-1790: Provide proper error message when trying access an unknown unit field.
- GH-1792: Prioritize error message reporting unknown field.
- GH-1803: Fix namespacing of
hiltiIDs in Spicy-side diagnostic output. - GH-1895: Do no longer escape backslashes when printing strings or bytes.
- GH-1857: Support
&requiresfor individual vector items. - GH-1859: Improve error message when a unit parameter is used as a field.
- GH-1898: Disallow attributes on "type aliases".
- GH-1938: Deprecate
&countattribute.
Bug fixes
- GH-1815: Disallow expanding limited
View's again withlimit. - Fix
to_uint(ByteOrder)for empty byte ranges. - Fix undefined shifts of 32bit integer in
toInt(). - GH-1817: Prevent null ptr dereference when looking on nodes without
Scope. - Fix use of move'd from variable.
- GH-1823: Don't qualify magic linker symbols with C++ namespace.
- Fix diagnostics seen when compiling with GCC.
- GH-1852: Fix
skipwith units. - GH-1832: Fail for vectors with bytes but no stop.
- GH-1860: Fix parsing for vectors of literals.
- GH-1847: Fix resynchronization issue with trimmed input.
- GH-1844: Fix nested look-ahead parsing.
- GH-1842: Fix when input redirection becomes visible.
- GH-1846: Fix bug with captures groups.
- GH-1875: Fix potential nullptr dereference when comparing streams.
- GH-1867: Fix infinite loops with recursive types.
- GH-1868: Associate source code locations with current fiber instead of current thread.
- GH-1871: Fix
&max-sizeon unit containing aswitch. - GH-1791: Fix usage of
&convertwith unit's requiring parameters. - GH-1858: Fix the literals parsers not following coercions.
- GH-1893: Encompass child node's location in parent.
- GH-1919: Validate that sets are sortable.
- GH-1918: Fix potential segfault with stream iterators.
- GH-1856: Disallow dereferencing a
result<void>value. - Fix issue with type inference for
resultconstructor.
Documentation
v1.11.3
Bug fixes
-
GH-1846: Fix bug with captures groups.
When extracting the data matching capture groups we'd take it from the beginning of the stream, not the beginning of the current view, even though the latter is what we are matching against.
-
Add missing trim after matching a regular expression.
-
GH-1875: Fix potential nullptr dereference when comparing streams.
Because we are operating on unsafe iterators, need to catch when one goes out of bounds.
-
GH-1842: Fix when input redirection becomes visible.
With
&parse-at/fromwe were updating the internal state on our current position immediately, meaning they were visible already when evaluating other attributes on the same field afterwards, which is unexpected. -
GH-1844: Fix nested look-ahead parsing.
When parsing nested vectors all using look-ahead, we need to return control back to upper level when an inner look-ahead isn't found.
This may change the error message for "normal" look-ahead parsing (see test baseline), but the new one seems fine and potentially even better.