Skip to content

Support path/1 and data with lifetimes#296

Merged
01mf02 merged 46 commits intomainfrom
paths
Jul 30, 2025
Merged

Support path/1 and data with lifetimes#296
01mf02 merged 46 commits intomainfrom
paths

Conversation

@01mf02
Copy link
Copy Markdown
Owner

@01mf02 01mf02 commented Jun 27, 2025

This PR makes two large changes to jaq's core:

  • It adds support for the path(f) filter. This is a user-facing enhancement which should not impact the execution of previously available filters.
  • It makes it possible to use value types with a lifetime, as well as pass arbitrary data types to native filters. This is only relevant for you if you use jaq's API.

This PR changes the API of jaq-core in a backwards-incompatible way; therefore, it will be part of jaq 3.0.

path/1

The support for path/1 makes several things possible in jaq; in particular, path-based updates à la jq!
However, the execution of f |= g will keep using non-path-based updates due to their greater performance and resistance to iterator invalidation problems.
Still, it is possible to define def update(f; g): reduce path(f) as $p (.; getpath($p) |= g); and have update(f; g) in jaq do mostly the same thing as f |= g in jq. Be aware, however, that this does not attempt to work around iterator invalidation issues the same way as jq does; to avoid these issues, I advise you to use jaq's f |= g, which is more robust.

Design

Before implementing this, I made a small experiment where I tried to give a rough estimation of the performance overhead if I would change jaq such that path(...) execution and normal execution would share the same code.
For that, I modified the range function

Box::new(range(Ok(from), to, by))
to carry with it some kind of "bogus path". The results are:

$ hyperfine "target/release/jaq -n '[range(10000000)] | length'"

Box::new(range(Ok(from), to, by)) (original):
Time (mean ± σ): 418.0 ms ± 1.4 ms [User: 381.0 ms, System: 34.3 ms]
Range (min … max): 415.7 ms … 420.4 ms 10 runs

Box::new(range(Ok(from), to, by).map(|v| (v, None as Option<String>)).map(|(v, s)| v)):
Time (mean ± σ): 469.1 ms ± 4.7 ms [User: 432.0 ms, System: 34.5 ms]
Range (min … max): 460.3 ms … 475.9 ms 10 runs

Box::new(range(Ok(from), to, by).map(|v| (v, Some(String::new()))).map(|(v, s)| v)):
Time (mean ± σ): 472.4 ms ± 3.0 ms [User: 434.6 ms, System: 35.0 ms]
Range (min … max): 469.2 ms … 476.5 ms 10 runs

In a nutshell, that means that calculating paths everywhere --- even when you do not need them --- costs about 12% (469ms / 418ms) of performance overhead. That's too much of performance overhead to accept for me.

Therefore, I opted for a design where executing path(f) uses code different from the normal execution code (that would be called when just executing f on its own). That entails some code duplication, but it's not that bad. I tried to share as much code as possible between the path and normal execution code, even if that required giving some helper functions even more complex function signatures. As a side-effect, this makes these functions more difficult to misuse, even if it also makes them harder to understand.

Compatibility

There is currently a small difference between the path(f) semantics of jaq and jq: In jaq, if a subexpression of f is executed and it does return a non-path value, then an error is thrown, whereas in jq, an error is only thrown if such a non-path value is actually returned from f.
To see the difference: jq -n 'path([] | empty)' returns no output, whereas jaq -n 'path([] | empty)' returns an error.
This difference could be eliminated, but it would cost some performance and memory, because instead of returning a path RcList<V> (paths are implemented as linked lists), we would need to return an Option<RcList<V>> everywhere.
That would also make the implementation a bit more awkward and the API more complex.
Given that I suppose that few people use such paths, I do not think it is worth the effort. But if you are concerned, feel free to speak up.

Short-lived value types & arbitrary data for native filters

The new DataT trait makes it possible to use value types that have a lifetime unknown at the time of filter compilation. For example, if you wanted to treat a value type Val<'a> where 'a is the lifetime of data that was loaded after filter compilation, you were out of luck. Now, this is supported.

Furthermore, the DataT trait also allows passing arbitrary data to native filters. Previously, the inputs filter enjoyed some special treatment, because it was the only native filter that could obtain some kind of "global" data. At the same time, this also implied that even if one did not want to provide an implementation of inputs, it was still necessary to pass the data necessary for inputs when executing a filter. This was cumbersome and felt unclean.
The new machinery generalises the mechanism previously available for inputs. This makes the core of jaq completely unaware of side effects and makes it possible to realise variations of jaq that are completely pure! It is also possible to go the other direction, namely integration of more complex side effects than previously possible. For example, this paves the path towards resolving #144.

The DataT trait uses GATs, which are available from Rust 1.65. However, early Rust versions supporting this feature were quite limited in their type inference, as I had to find out the hard way. Therefore, I increased MSRV to 1.69, which is the first version that can compile the code without serious adaptations.

01mf02 added 4 commits July 15, 2025 06:38
When the very first command to a REPL is "^D" (EOF),
all subsequent REPL calls are ignored until control is given back to
a REPL at a lower depth.

This makes it possible to quit jaq when running something like `recurse(.) | repl`.
@01mf02
Copy link
Copy Markdown
Owner Author

01mf02 commented Jul 22, 2025

Whoops, this breaks compilation with Rust 1.65: I now get lots of errors:

error: `<D as DataT>::V<'_>` does not live long enough

I can address some of these errors by:

-fn fold_run<'a, D: DataT, T: Clone + 'a>(
+fn fold_run<'a, V: 'a, D: DataT<V<'a> = V>, T: Clone + 'a>(

-fn fold_update<'a, D: DataT>(
+fn fold_update<'a, V: 'a, D: DataT<V<'a> = V>>(

Now there are only a few remaining:

$ cargo +1.65 check
    Checking jaq-core v2.2.1
error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:514:44
    |
514 |                 let u = move |x: D::V<'a>| box_once(op.run(x, y.clone()).map_err(Exn::from));
    |                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:518:44
    |
518 |                 let u = move |x: D::V<'a>| box_once(Ok(if x.as_bool() { x } else { y.clone() }));
    |                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:522:51
    |
522 |                 path.update(cv, Box::new(move |_| box_once(Ok(y.clone()))))
    |                                                   ^^^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:677:60
    |
677 |                     box_once(paths.try_fold(v, |acc, path| path?.update(acc, &f)))
    |                                                            ^^^^^^^^^^^^^^^^^^^^^

error: `<D as DataT>::V<'_>` does not live long enough
   --> jaq-core/src/filter.rs:677:21
    |
677 |                     box_once(paths.try_fold(v, |acc, path| path?.update(acc, &f)))
    |                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

error: could not compile `jaq-core` due to 5 previous errors

Any ideas?

EDIT: I found that the errors above persist until (including) Rust 1.68 and disappear starting from Rust 1.69.

@01mf02
Copy link
Copy Markdown
Owner Author

01mf02 commented Jul 22, 2025

A little note: We currently have many type signatures in jaq-std looking like this:

pub fn funs<D: DataT>() -> impl Iterator<Item = Filter<Native<D>>>
where
    for<'a> D::V<'a>: ValT,
{ ... }

This is a bit clunky. With associated type bounds, we could write this more compactly as follows, but that would increase MSRV to 1.79:

pub trait DataTx: for<'a> DataT<V<'a>: ValT> {}
impl<T: DataT> DataTx for T where for<'a> T::V<'a>: ValT {}

pub fn funs<D: DataTx>() -> impl Iterator<Item = Filter<Native<D>>> { ... }

It's probably not a big deal for now, the whole added where for<'a> ... stuff amounts to less than 30 lines in jaq-std, which is quite bearable and probably not worth increasing the MSRV by at least 10 versions.

01mf02 added 4 commits July 23, 2025 15:54
This branch could actually do with 1.69 everywhere,
but because we will require 1.70 for CBOR support eventually
(due to `ciborium_ll` depending on `half`), we do it right away.
@01mf02 01mf02 changed the title Path recording for filter execution Support path/1 and value types with lifetimes Jul 30, 2025
@01mf02 01mf02 changed the title Support path/1 and value types with lifetimes Support path/1 and data with lifetimes Jul 30, 2025
@01mf02 01mf02 merged commit 47dd9e1 into main Jul 30, 2025
3 checks passed
@01mf02 01mf02 deleted the paths branch July 30, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant