Skip to content

[Bug]: Registering DoFns and CombineFns Seems Excessively Slow #34693

Open
@JonathanHopeDMRC

Description

@JonathanHopeDMRC

What happened?

SDK Version: 2.63
SDK Language: Go

While there is no doubt some cost to registering DoFns and CombineFns with Beam, as it stands the cost right now seems excessively high. I am working on a pipeline that currently has around 30 DoFns. I've been noticing the builds taking a very long time for a while so I spent some time debugging them. I noticed that it was the calls to beam.Register* and register.* that were slowing down the builds. To illustrate this I moved all of those calls to separate package and used actiongraph to measure the time those calls were taking relative to everything else:

❯ actiongraph -f /tmp/actiongraph6 top             
263.156s  58.50%  build pcmig/pkg/start
 26.691s  64.43%  link  pcmig/cmd/batch
 15.179s  67.81%  build github.com/apache/beam/sdks/v2/go/pkg/beam/io/fileio

As you can see those calls are 10x slower than the next slowest thing. It's possible that this is expected behavior, but I wanted to raise the issue just in case.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions