Skip to content

Conversation

@pietroalbini
Copy link

This branch contains my solutions with function interning.

@Robbepop
Copy link

Thank you for the PR!
I forwarded it to Jack, interested to hear what he is thinking about it.

@eira-fransham
Copy link
Owner

eira-fransham commented Nov 27, 2018

Looks good, do you have the cargo benchcmp results? There are some theoretical ways that you could improve performance (a lock-free structure instead of RefCell<Vec<..>> should be possible) but it would require a lot of extra unsafe since there doesn't seem to be a good implementation on crates.io

@pietroalbini
Copy link
Author

 name                           initial.bench ns/iter  exercise2.bench ns/iter  diff ns/iter   diff %  speedup 
 benches::parse_deep_nesting    57,797                 57,148                           -649   -1.12%   x 1.01 
 benches::parse_literals        39,164                 30,886                         -8,278  -21.14%   x 1.27 
 benches::parse_many_variables  211,446                183,455                       -27,991  -13.24%   x 1.15 
 benches::parse_nested_func     40,450                 39,380                         -1,070   -2.65%   x 1.03 
 benches::parse_real_code       5,224                  4,767                            -457   -8.75%   x 1.10 
 benches::run_deep_nesting      2,677                  693                            -1,984  -74.11%   x 3.86 
 benches::run_many_variables    44,658                 15,481                        -29,177  -65.33%   x 2.88 
 benches::run_nested_func       6,485                  1,839                          -4,646  -71.64%   x 3.53 
 benches::run_real_code         153,139                10,140                       -142,999  -93.38%  x 15.10 

@eira-fransham
Copy link
Owner

An optimisation you could do on top of this is convert Function from array-of-struct to struct-of-arrays, like so:

struct InternedFunction {
    args: Range<usize>,
    body: Range<usize>,
}

struct Functions<'src> {
    args: Vec<&'src str>,
    body: Vec<Ast<'src>>,
}

I'd also have FunctionsBuilder which has an internal RefCell, and then a .freeze method that converts it to a Functions that no longer has to do checks on deref (since we never add a new interned function at runtime).

Actually, I'll make these changes quick and then see what the benchmarks look like.

@eira-fransham
Copy link
Owner

eira-fransham commented Dec 5, 2018

So I implemented this work and the speedup is minimal (on run_... benchmarks, which is where this should be giving a speedup), I guess the improved cache usage doesn't actually matter in this case. This is why you should always benchmark:

 name                           preopt.bench ns/iter  postopt.bench ns/iter  diff ns/iter  diff %  speedup 
 benches::parse_deep_nesting    53,015                51,942                       -1,073  -2.02%   x 1.02 
 benches::parse_literals        30,530                29,504                       -1,026  -3.36%   x 1.03 
 benches::parse_many_variables  164,479               167,721                       3,242   1.97%   x 0.98 
 benches::parse_nested_func     36,008                36,796                          788   2.19%   x 0.98 
 benches::parse_real_code       4,302                 4,358                            56   1.30%   x 0.99 
 benches::run_deep_nesting      735                   732                              -3  -0.41%   x 1.00 
 benches::run_many_variables    7,352                 7,412                            60   0.82%   x 0.99 
 benches::run_nested_func       950                   950                               0   0.00%   x 1.00 
 benches::run_real_code         5,884                 5,707                          -177  -3.01%   x 1.03 

If you want to see the code you can check it out on my intern-functions branch. I had to write my own Range struct so that I could make it impl Copy, I don't know if matters though - it should only matter if we're Clone::cloneing these ranges which we're not in this case. I was just trying different things to see what could improve the runtime.

Weirdly, it's faster on two of the parse_... benchmarks, when one would expect that it would make parsing slower while making running faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants