Support `path/1` and data with lifetimes by 01mf02 · Pull Request #296 · 01mf02/jaq

01mf02 · 2025-06-27T07:40:27Z

This PR makes two large changes to jaq's core:

It adds support for the path(f) filter. This is a user-facing enhancement which should not impact the execution of previously available filters.
It makes it possible to use value types with a lifetime, as well as pass arbitrary data types to native filters. This is only relevant for you if you use jaq's API.

This PR changes the API of jaq-core in a backwards-incompatible way; therefore, it will be part of jaq 3.0.

`path/1`

The support for path/1 makes several things possible in jaq; in particular, path-based updates à la jq!
However, the execution of f |= g will keep using non-path-based updates due to their greater performance and resistance to iterator invalidation problems.
Still, it is possible to define def update(f; g): reduce path(f) as $p (.; getpath($p) |= g); and have update(f; g) in jaq do mostly the same thing as f |= g in jq. Be aware, however, that this does not attempt to work around iterator invalidation issues the same way as jq does; to avoid these issues, I advise you to use jaq's f |= g, which is more robust.

Design

Before implementing this, I made a small experiment where I tried to give a rough estimation of the performance overhead if I would change jaq such that path(...) execution and normal execution would share the same code.
For that, I modified the range function

jaq/jaq-std/src/lib.rs

Line 391 in 8c5131b

Box::new(range(Ok(from), to, by))

to carry with it some kind of "bogus path". The results are:

$ hyperfine "target/release/jaq -n '[range(10000000)] | length'"

Box::new(range(Ok(from), to, by)) (original):
Time (mean ± σ): 418.0 ms ± 1.4 ms [User: 381.0 ms, System: 34.3 ms]
Range (min … max): 415.7 ms … 420.4 ms 10 runs

Box::new(range(Ok(from), to, by).map(|v| (v, None as Option<String>)).map(|(v, s)| v)):
Time (mean ± σ): 469.1 ms ± 4.7 ms [User: 432.0 ms, System: 34.5 ms]
Range (min … max): 460.3 ms … 475.9 ms 10 runs

Box::new(range(Ok(from), to, by).map(|v| (v, Some(String::new()))).map(|(v, s)| v)):
Time (mean ± σ): 472.4 ms ± 3.0 ms [User: 434.6 ms, System: 35.0 ms]
Range (min … max): 469.2 ms … 476.5 ms 10 runs

In a nutshell, that means that calculating paths everywhere --- even when you do not need them --- costs about 12% (469ms / 418ms) of performance overhead. That's too much of performance overhead to accept for me.

Therefore, I opted for a design where executing path(f) uses code different from the normal execution code (that would be called when just executing f on its own). That entails some code duplication, but it's not that bad. I tried to share as much code as possible between the path and normal execution code, even if that required giving some helper functions even more complex function signatures. As a side-effect, this makes these functions more difficult to misuse, even if it also makes them harder to understand.

Compatibility

There is currently a small difference between the path(f) semantics of jaq and jq: In jaq, if a subexpression of f is executed and it does return a non-path value, then an error is thrown, whereas in jq, an error is only thrown if such a non-path value is actually returned from f.
To see the difference: jq -n 'path([] | empty)' returns no output, whereas jaq -n 'path([] | empty)' returns an error.
This difference could be eliminated, but it would cost some performance and memory, because instead of returning a path RcList<V> (paths are implemented as linked lists), we would need to return an Option<RcList<V>> everywhere.
That would also make the implementation a bit more awkward and the API more complex.
Given that I suppose that few people use such paths, I do not think it is worth the effort. But if you are concerned, feel free to speak up.

Short-lived value types & arbitrary data for native filters

The new DataT trait makes it possible to use value types that have a lifetime unknown at the time of filter compilation. For example, if you wanted to treat a value type Val<'a> where 'a is the lifetime of data that was loaded after filter compilation, you were out of luck. Now, this is supported.

Furthermore, the DataT trait also allows passing arbitrary data to native filters. Previously, the inputs filter enjoyed some special treatment, because it was the only native filter that could obtain some kind of "global" data. At the same time, this also implied that even if one did not want to provide an implementation of inputs, it was still necessary to pass the data necessary for inputs when executing a filter. This was cumbersome and felt unclean.
The new machinery generalises the mechanism previously available for inputs. This makes the core of jaq completely unaware of side effects and makes it possible to realise variations of jaq that are completely pure! It is also possible to go the other direction, namely integration of more complex side effects than previously possible. For example, this paves the path towards resolving #144.

The DataT trait uses GATs, which are available from Rust 1.65. However, early Rust versions supporting this feature were quite limited in their type inference, as I had to find out the hard way. Therefore, I increased MSRV to 1.69, which is the first version that can compile the code without serious adaptations.

When the very first command to a REPL is "^D" (EOF), all subsequent REPL calls are ignored until control is given back to a REPL at a lower depth. This makes it possible to quit jaq when running something like `recurse(.) | repl`.

01mf02 · 2025-07-22T08:38:08Z

Whoops, this breaks compilation with Rust 1.65: I now get lots of errors:

error: `<D as DataT>::V<'_>` does not live long enough

I can address some of these errors by:

-fn fold_run<'a, D: DataT, T: Clone + 'a>(
+fn fold_run<'a, V: 'a, D: DataT<V<'a> = V>, T: Clone + 'a>(

-fn fold_update<'a, D: DataT>(
+fn fold_update<'a, V: 'a, D: DataT<V<'a> = V>>(

Now there are only a few remaining:

$ cargo +1.65 check
    Checking jaq-core v2.2.1
error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:514:44
    |
514 |                 let u = move |x: D::V<'a>| box_once(op.run(x, y.clone()).map_err(Exn::from));
    |                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:518:44
    |
518 |                 let u = move |x: D::V<'a>| box_once(Ok(if x.as_bool() { x } else { y.clone() }));
    |                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:522:51
    |
522 |                 path.update(cv, Box::new(move |_| box_once(Ok(y.clone()))))
    |                                                   ^^^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:677:60
    |
677 |                     box_once(paths.try_fold(v, |acc, path| path?.update(acc, &f)))
    |                                                            ^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:677:21
    |
677 |                     box_once(paths.try_fold(v, |acc, path| path?.update(acc, &f)))
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: could not compile `jaq-core` due to 5 previous errors

Any ideas?

EDIT: I found that the errors above persist until (including) Rust 1.68 and disappear starting from Rust 1.69.

01mf02 · 2025-07-22T13:26:57Z

A little note: We currently have many type signatures in jaq-std looking like this:

pub fn funs<D: DataT>() -> impl Iterator<Item = Filter<Native<D>>>
where
    for<'a> D::V<'a>: ValT,
{ ... }

This is a bit clunky. With associated type bounds, we could write this more compactly as follows, but that would increase MSRV to 1.79:

pub trait DataTx: for<'a> DataT<V<'a>: ValT> {}
impl<T: DataT> DataTx for T where for<'a> T::V<'a>: ValT {}

pub fn funs<D: DataTx>() -> impl Iterator<Item = Filter<Native<D>>> { ... }

It's probably not a big deal for now, the whole added where for<'a> ... stuff amounts to less than 30 lines in jaq-std, which is quite bearable and probably not worth increasing the MSRV by at least 10 versions.

This branch could actually do with 1.69 everywhere, but because we will require 1.70 for CBOR support eventually (due to `ciborium_ll` depending on `half`), we do it right away.

01mf02 added 24 commits June 26, 2025 12:38

First version of path-recording evaluation.

c18b247

Remove unused struct.

3e5320e

More efficient (and public) iteration over RcList.

4fac2e2

Convert path parts to values.

c359397

Implement path filter!

005701b

Make it possible to define native path-returning filters.

34590b1

Promote first, last, limit to path-returning filters.

5323484

Satisfy Clippy.

5f1b660

Box relevant exception variants to decrease size of ValX.

5dfe8b8

Remove module-specific Results types.

e46e346

Change From<path::Part> bound on ValT to From<val::Range>.

b1bed66

Remove lots of lifetimes!

abdd7cd

Merge branch 'main' into paths

0832c9a

Eliminate FilterT to simplify the API.

51bd02b

Make Inputs public for better documentation usability.

80a3df3

Reshuffle.

0553e13

Capture input values for path expression errors.

aaa5a2d

Simplify a bit.

18a0310

Make tests compile again.

a26b347

Move key-value related filters from jaq-json to jaq-std.

bbedbfb

Shorten a bit.

dd79a68

Format.

0c422e7

Create def_run function for common definition calling logic.

49735ed

Move LUT into context.

2dd93ac

01mf02 force-pushed the paths branch from f0cc337 to 2dd93ac Compare July 2, 2025 17:19

01mf02 added 5 commits July 2, 2025 19:34

Merge branch 'main' into paths

4d7425f

Adapt bsearch to new calling conventions.

963814b

Adapt fuzzing.

f7b19b3

Make Filter::run more liberal with its lifetimes.

2432e89

"Rebase".

387573a

01mf02 added 2 commits July 10, 2025 21:02

Allow passing custom global data to native functions!

c214531

Promote input to a native filter & restructure.

ada006d

01mf02 force-pushed the paths branch from 4ab89f6 to ada006d Compare July 11, 2025 08:50

01mf02 added 4 commits July 11, 2025 12:21

Document DataT.

abba235

Make fuzzing targets compile again.

975bf79

Store Data without pointer to make it exchangeable during execution.

c5fa470

Introduce IdRunFn.

ca1e0ac

01mf02 force-pushed the paths branch from 45c957c to ca1e0ac Compare July 11, 2025 16:31

Move type of values V into DataT.

62e52a0

01mf02 force-pushed the paths branch from 3d4eb67 to 671348e Compare July 14, 2025 15:38

01mf02 added 4 commits July 15, 2025 06:38

Polishing.

e7b89b9

Make fuzz targets compile.

4237ca9

Documentation.

7e35190

REPL killing.

45bd91b

When the very first command to a REPL is "^D" (EOF), all subsequent REPL calls are ignored until control is given back to a REPL at a lower depth. This makes it possible to quit jaq when running something like `recurse(.) | repl`.

01mf02 force-pushed the paths branch from 671348e to 45bd91b Compare July 15, 2025 07:17

01mf02 added 2 commits July 22, 2025 12:18

Remove associated trait bounds to achieve MSRV = 1.69.

f8ff548

Formatting.

e87dcc7

01mf02 added 4 commits July 23, 2025 15:54

Remove Val::as_str to avoid conflicts with ValT::as_str.

7a92b54

Merge main into paths.

6f9415d

Make jaq-std compile with Rust 1.70.

c079ba8

Increase MSRV to 1.69 / 1.70.

f8414ea

This branch could actually do with 1.69 everywhere, but because we will require 1.70 for CBOR support eventually (due to `ciborium_ll` depending on `half`), we do it right away.

01mf02 force-pushed the paths branch from b5b3b57 to f8414ea Compare July 26, 2025 09:59

01mf02 changed the title ~~Path recording for filter execution~~ Support path/1 and value types with lifetimes Jul 30, 2025

01mf02 changed the title ~~Support path/1 and value types with lifetimes~~ Support path/1 and data with lifetimes Jul 30, 2025

01mf02 merged commit 47dd9e1 into main Jul 30, 2025
3 checks passed

01mf02 deleted the paths branch July 30, 2025 15:49

01mf02 mentioned this pull request Sep 25, 2025

Parallel processing with jaq_json #323

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support `path/1` and data with lifetimes#296

Support `path/1` and data with lifetimes#296
01mf02 merged 46 commits intomainfrom
paths

01mf02 commented Jun 27, 2025 •

edited

Loading

Uh oh!

01mf02 commented Jul 22, 2025 •

edited

Loading

Uh oh!

01mf02 commented Jul 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

01mf02 commented Jun 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

path/1

Design

Compatibility

Short-lived value types & arbitrary data for native filters

Uh oh!

01mf02 commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

01mf02 commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

01mf02 commented Jun 27, 2025 •

edited

Loading

`path/1`

01mf02 commented Jul 22, 2025 •

edited

Loading

01mf02 commented Jul 22, 2025 •

edited

Loading