Open
Description
Did some check on the beam code and find out that DataFlow is querying BigQuery and retrieve the result using pagination [1]. As per our understanding, this means no parallelism on reading BigQuery table. It is contradictory to what the documentation is telling us [2].
Is this some kind of work in progress? I'm filing as a bug since documentation telling me that it is using GCS meanwhile it's using NativeSourceReader which yield data per row as iterator.
[1]
beam/sdks/python/apache_beam/io/gcp/bigquery.py
Line 1083 in 520b3a2
[2]
Imported from Jira BEAM-5352. Original Jira may contain additional context.
Reported by: rendybjunior.