Skip to content

Improve performance#1

Open
tisztamo wants to merge 1 commit into
JuliaFolds:masterfrom
tisztamo:master
Open

Improve performance#1
tisztamo wants to merge 1 commit into
JuliaFolds:masterfrom
tisztamo:master

Conversation

@tisztamo
Copy link
Copy Markdown

A possible fix of the missing performance gain.

julia> demo_sum()
Baseline:
  4.649 s (0 allocations: 0 bytes)
Catwalk defaults:
  4.210 s (5607369 allocations: 116.47 MiB)
Catwalk tuned:
  4.007 s (2775829 allocations: 55.60 MiB)

The problem was that

@inline next(rf::R_{Map}, result, input) = next(inner(rf), result, xform(rf).f(input))

was called before the Catwalked method of next, resulting in a non-jitted dynamic dispatch.

I am not sure though if what I did is reasonable in the larger context, but I hope you can fix it based on this.

Also, the default batch size was too small, so I have increased it to 1e6, which may be more than ideal, more tests are needed.

When testing with @btime, initial overhead should be small, but I see a small amount of compilation in every Catwalked run, thats why the tested runtimes have to be several seconds. I will check that, but I like to test cold runs with @time anyway, because Catwalk adds significant compiling overhead, and not measuring it seems unfair.

@tkf
Copy link
Copy Markdown
Member

tkf commented Mar 29, 2021

The problem was that

@inline next(rf::R_{Map}, result, input) = next(inner(rf), result, xform(rf).f(input))

was called before the Catwalked method of next, resulting in a non-jitted dynamic dispatch.

Oooh, I see. Unfortunately, the patch in this PR is not generic enough since, in general, we can't assume any structure outer/left to OptimizeInner() (inner/right, too). For example, it's reasonable to have

xs |> Filter(x -> x > 0) |> Map(type_instability) |> OptimizeInner() |> Map(asint)

(But it does help me understand the problem. Thanks!)

I'm not sure what's the best strategy, though. I think we need something like

@please_inline Transducers.next(rf::R_{OptimizeXF}, acc, @nospecialize(input)) = ...

in the Julia compiler to fully solve the problem; i.e., the compiler inlines this even though input cannot be inferred.

Meanwhile, maybe I should stop trying to support OptimizeInner(). Maybe it's still possible to support the case where the instability happens on the iterator side

(type_instability(x) for x in xs) |> Map(asint)
#                                    ----------
#                                    JIT'ed

@tisztamo
Copy link
Copy Markdown
Author

Yeah, I had the fear that simply eliminating that call is not the way to go...

About the compiler support: I struggle a lot with inlining, and the possibility to force it would result in measurable performance improvements in the original target of Catwalk.jl, but I never was brave enough to ask for it...

Forcing inlining from the call site seems a bit less risky in terms of accidental compilation overhead, and now we have a real use case. Do you think it is time to open an issue?

@tkf
Copy link
Copy Markdown
Member

tkf commented Mar 30, 2021

My guess is that many Julia programmers wished there was a forced/more controllable inlining macro at least once. I couldn't find it in the issue tracker, though, which is kinda strange. Maybe everyone assumed there is already one 😄 . So yeah, I think it'd be nice to have an issue for this.

@tisztamo
Copy link
Copy Markdown
Author

tisztamo commented Jun 28, 2021

Great news, @tkf : JuliaLang/julia#41328 allows forced inlining! I have tested this case (only on non-folds, non-catwalk sample code for now, I have package installation issues after compiling 1.8-dev).

@tkf
Copy link
Copy Markdown
Member

tkf commented Jun 29, 2021

Thanks! Yeah, that's great news, esp. for packages heavily depend on higher-order function like Transducers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants