Skip to content

document streaming optimization #346

Open
@st1page

Description

@st1page

Overview

Remind users that streaming queries differ from ad-hoc queries or other database queries. Once submitted, a streaming task will run long-running on compute nodes. Currently, RisingWave does not support modifying the plan of a submitted streaming query. Therefore, SQL statements with performance implications should be carefully considered before creation.
To view the streaming query plan, use EXPLAIN CREATE MATERIALIZED VIEW/SINK xxx .... Adding EXPLAIN (DISTSQL, VERBOSE) will display more details.

State

A key difference with streaming jobs is that they require persistent storage of operator states. In RisingWave, these states are stored remotely in cloud object storage like S3. For operators such as joins or over windows, the state can become very large and may grow indefinitely. The state primarily impacts performance in the following ways:

  • The cost and expense of storing data on S3.
  • For operators, fast access to the required state is crucial for efficient streaming execution. Larger states require more memory for caching to accelerate performance.

Users can use EXPLAIN (DISTSQL, VERBOSE) to examine the states associated with a streaming SQL job.
The command SHOW INTERNAL TABLE can be used to view the current state tables in streaming jobs (this feature requires completion of issue: risingwavelabs/risingwave#16520).

APPEND ONLY

Describes the concept of append-only streams.
Append-only streams can impact operator execution efficiency and state size. For example, the state for MAX() and MIN() operations.
A list should be provided to indicate whether each operator (SQL syntax) preserves the append-only property of the input. Example: DISTINCT ON on an append-only stream still produces append-only output.

Watermark

Refer to the current documentation, which is relatively comprehensive.

EOWC can transfer a non-append only but with watermark stream to append only

state information for each operator (SQL syntax)

Can be categorized into three types: stateless, limited, and unbounded. Explain the differences in state for each operator under conditions of append-only/watermark.

Impact of workload on streaming performance

Cache locality
Amplification

Special workload: backfilling

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions