Skip to content

cds-astro/cds-expreval-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

expr-eval

Expreval is another expression evaluator in Rust, focuses on astronomical data.

About

The purpose is to put constraints to select rows of a relational table (like in a SQL WHERE clause) and compute new columns on-the-fly from other columns (like in a SQL SELECT). It supports parts of the SQL syntax (e..g IS NULL, LIKE, AND, OR) in addition to programming language oriented syntax (&&, ||, ...). It supports quite a large bunch of datatypes: boolean, char, u8, u16, u32, u64 i8, i16, i32, i64, f32, f64, String, and one additional nullable/optional type for each of those basic types.

Implementation insight

The expression "compilation" is made of 2 or 3 steps

  • Lexical analysis, based on nom.
  • Infix to postfix conversion using the shunting-yard algortihm. During this steps, a basic evaluation of constant terms is also performed.

Then you can either:

  • evaluate the expression using as input teh values of a row
  • or first perform an additional pass to create an implicit AST based on closure objects (Box<dyn Closure>), obtaining in output a (boxed) closure taking a row in input

WARNINGS

  • This library has not been thoroughfully checked! So:
    • use at your own risks
    • help us adding tests
  • using a non-optional parse (i.e., e.g., parseF64 instead of parseOptF64) on an optional field return the default value (e.g. 0 for a float).

TODO list

  • Write tests, tests and tests!! (Comparing e.g. with SQLite queries)
  • Add column index and column name: $i, ${name}, simple column name matching [_a-zA-Z0-9]+ (but not matching a number)
  • Add binary relational operators: <, <=, >, >=, =, ==, !=, <>, EQ, NE, LT, LE, GE, GE, EQ, NE (case insensitive)
  • Add unary relational operators: IS NULL, IS NOT NULL, LIKE (case insensitive)
  • Add logical operators: &&, || (=CONCAT in SQL :o/ ), AND, OR (case insensitive)
  • Add unary logical operator: NOT, !
  • Add ternary operator: condition ? branch true : branch false
  • Add bitwise operators: &, |, ^, >>, <<
  • Add general functions (all case insensitive)
    • abs, ln, log, log10, pow, exp, sqrt
    • trigo: cos, sin, tan, acos, asin, atan, atan2
    • conversion: to_degrees, to_radians, to_deg, to_rad, toDegrees, toRadians, toDeg, toRad
    • [] String operations: substr, to_lower, to_higher, trim, ltrim, rtrim... ?
  • Add astrometric functions (case insensitive)
    • haversine(ra1_deg, dec1_deg, ra2_deg, dec2_deg)
    • pm_propagate(ra_deg, dec_deg, pmra_mas/yr, pmde_mas/yr, epoch_diff_yr), alias: pmPropagate
    • add pm_propagate_with_err
    • add motion_propagate and motion_propagate_with_err
    • add hpx(depth, lon, lat) function
  • Add JNAME function
  • Add Time functions:
    • mjd2jd, jd2mjd, mjd2julian_epoch, julian_epoch2mjd
    • ? mjd2gregorian_decimal_year, gregorian_calendar2mjd, ...
  • Add variadic functions
    • add string concatenation (str1 || str2 in SQL, here use concat(str1, str2,))
    • implement compiled version of cavg
    • implement cmin, cmax in compiled
  • Support pre-computation of constant terms: e.g. sin(to_radians(45)) is transformed in 0.70710678118654752
  • Supports kind of compilation for faster evaluation (but not as fast as real compilation)
  • Add String to types conversion:
    • regular types: parseBool, parse_bool, parseU8/16/32/64, parse_u8/16/32/64, parseI8/16/32/64, parse_i8/16/32/64, parseF32, parse_f32, parseF64, parse_f64
    • nullable types: parseOptBool, parse_opt_bool, parseOptU8/16/32/64, parse_opt_u8/16/32/64, parseOptI8/16/32/64, parse_opt_i8/16/32/64, parseOptF32, parse_opt_f32, parseOptF64, parse_opt_f64
  • Add format to transform numerical types into String (cast works, but the purpose is e.g. to allow for fixed precision in floats, ...)
    • fmt("%+010d", qty) for u8, u16, u32, u64, i8, i16, i32, i64
      • %+0wd for decimal notation (optional sign flag, 0 padding flag and width)
      • %wb for binary notation 0bxxx (optional width)
      • %wo for octal notation 0oxxx(optional width)
      • %wx for lower case hexadecimal notation 0xxxxx(optional width)
      • %wX for upper case hexadecimal notation oxXXXX (optional width)
    • fmt("%+10.2g", qty) for f32 and f64
    • fmt_opt("%+010d", "null", opt_qty) for u8, u16, u32, u64, i8, i16, i32, i64; f32 and f64
      • same as fmt("%+010d", qty) but with the null placeholder for None values
    • fmt_b("%true/false", qty) for bool, returns either a String or an Option<String>: for regular true or false output, you can simply cast as String
    • fmt_i("%+0width", qty) for i8, i16, i32, returns either a String or an Option<String>
    • fmt_f("%+0width.prec", qty) for f32 or f64, returns either a String or an Option<String>
    • fmt_e("%+width.prec", qty) for f32 or f64, returns either a String or an Option<String>
    • fmt_g("%+width.prec", qty) for f32 or f64, returns either a String or an Option<String>
    • fmt_opt_s("null", qty) for Option<String>, returns either a String or an Option<String>
  • Add cast operations
  • Improve errors! (Make a real error type instead of returning a String)
  • Uses procedural macros to automatically implement "Row" from a struct and "Table".

Compatibility with SQL types

type PostgreSQL Microsoft
bool boolean bit
char char(1) nchar(1)
u8 tinyint
u16
u32
u64
i8
i16 smallint smallint
i32 integer integer
i64 bigint bigint
f32 real real
f64 double precision double precision
String char/varchar(n) nchar/nvarchar(n)
optBol nullable boolean nullable bit
optChar nullable char(1) nullable nchar(1)
optU8 tinyint
optU16
optU32
optU64
optI8
optI16 nullable smallint nullable smallint
optI32 nullable integer nullable integer
optI64 nullable bigint nullable bigint
optF32 nullable real nullable real
optF64 nullable double precision nullable double precision
optString nullable char/varchar(n) nullable nchar/nvarchar(n)
date date
timestamp datetimeofset
decimal(p, s) decimal(p, s)

Sources:

Examples

See this blog post to add examples in a Rust project. Run examples:

cargo +nightly run --example csv_raw
cargo +nightly run --example csv_struct

Remark: they are also compiled by cargo test

E.g., run:

cargo run --example csv_raw examples/data/hip_main.csv --filter '${Proxy} is not null' \
  --col '${HIP}'     \
  --col '${Proxy}'   \
  --col '${Vmag}'    \
  --col '${VarFlag}' \
  --col '${r_Vmag}'  \
  --col '${RAICRS}'  \
  --col '${DEICRS}'  \
  --col '${BTmag}'   \
  --col '${e_BTmag}' \
  --col '${VTmag}'   \
  --col '${e_VTmag}' \
  --col '${B-V}'     \
  --col 'parse_opt_f32(${BTmag})-parse_opt_f32(${VTmag})'

or

cargo run --example csv_raw examples/data/hip_main.csv  \
  --col '${BTmag}'   \
  --col '${VTmag}'   \
  --col 'cavg(parse_opt_f32(${BTmag}), parse_opt_f32(${VTmag}))' \
  --col '${B-V}'     \
  --col 'parse_opt_f32(${BTmag})-parse_opt_f32(${VTmag})'

Benches

cargo +nightly test --release
cargo test test_dummy --release -- --nocapture

License

Like most projects in Rust, this project is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Some useful resources

  • Function pointers
    • type Binop = fn(i32, i32) -> i32;
  • Closures
    • "It's best to write functions using a generic type and one of the closure traits so your functions can accept either functions or closures"
  • Function composition on reddit, stackoverflow
fn compose<A, B, C, G, F>(f: F, g: G) -> impl Fn(A) -> C
where
  F: Fn(A) -> B,
  G: Fn(B) -> C,
{
  move |x| g(f(x))
}

About the shunting-yard, particulary important information found in th above links

Unary operators

A unary minus sign does not cause any operators to be popped from the stack. This is because, in the postfix output, the unary minus sign will always immediately follow its operand (whereas it always immediately precedes it in the infix), so no other operators can be popped before it at this point.

It should also be pointed out that the unary minus sign is usually treated as though it has higher precedence than * and /

Functions

Infix expressions often contain not only basic operands and parentheses but also more complex mathematical functions such as gcd(20,12). These are not hard to parse, either. When a function name like gcd is encountered, it is pushed onto the stack; immediately afterward, the following opening parenthesis will also be pushed onto the stack. Each comma behaves like a closing parenthesis, because it completes a subexpression—except that it does not cause the opening parenthesis to be popped. When the closing parenthesis is finally encountered, the opening parenthesis is popped, and if the top of the stack is now a function name, it too is popped, and transferred to the output stream.

Variadic functions

Variadic functions (that is, functions that do not have a fixed arity, but can take varying numbers of arguments) present an especial difficulty. Like the unary and binary minus signs, these may cause ambiguity once converted into postfix, so we must find some way to tag such functions in the postfix output so that the evaluator can determine how many of the preceding arguments belong to them. Note that, unlike in the case of the unary and binary minus signs, we cannot determine in advance while scanning the input how many arguments the function is taking. The easiest way to handle this is to maintain a second stack, which we might call the arity stack. Every time we encounter a function name, we push the number 1 onto the arity stack. Every time we encounter a comma, we increment the number on the top of the arity stack, since the comma indicates another argument to the function. Finally, when it comes time to pop off the function name from the operator stack, we also pop the number off the top of the arity stack; this tells us the arity of the function.

Some resources to be looked at for runtime compilation

Dynamic library loading

Plugins, using libloading

Other related projects in Rust

License

Like most projects in Rust, this project is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

About

Expresion evaluation library in Rust

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENCE-APACHE
MIT
LICENCE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages