Description
What happened?
SDK Version: 2.63
SDK Language: Go
While there is no doubt some cost to registering DoFns
and CombineFns
with Beam, as it stands the cost right now seems excessively high. I am working on a pipeline that currently has around 30 DoFns
. I've been noticing the builds taking a very long time for a while so I spent some time debugging them. I noticed that it was the calls to beam.Register*
and register.*
that were slowing down the builds. To illustrate this I moved all of those calls to separate package and used actiongraph
to measure the time those calls were taking relative to everything else:
❯ actiongraph -f /tmp/actiongraph6 top
263.156s 58.50% build pcmig/pkg/start
26.691s 64.43% link pcmig/cmd/batch
15.179s 67.81% build github.com/apache/beam/sdks/v2/go/pkg/beam/io/fileio
As you can see those calls are 10x slower than the next slowest thing. It's possible that this is expected behavior, but I wanted to raise the issue just in case.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner