Description
Overview
Remind users that streaming queries differ from ad-hoc queries or other database queries. Once submitted, a streaming task will run long-running on compute nodes. Currently, RisingWave does not support modifying the plan of a submitted streaming query. Therefore, SQL statements with performance implications should be carefully considered before creation.
To view the streaming query plan, use EXPLAIN CREATE MATERIALIZED VIEW/SINK xxx ....
Adding EXPLAIN (DISTSQL, VERBOSE)
will display more details.
State
A key difference with streaming jobs is that they require persistent storage of operator states. In RisingWave, these states are stored remotely in cloud object storage like S3. For operators such as joins or over windows, the state can become very large and may grow indefinitely. The state primarily impacts performance in the following ways:
- The cost and expense of storing data on S3.
- For operators, fast access to the required state is crucial for efficient streaming execution. Larger states require more memory for caching to accelerate performance.
Users can use EXPLAIN (DISTSQL, VERBOSE)
to examine the states associated with a streaming SQL job.
The command SHOW INTERNAL TABLE
can be used to view the current state tables in streaming jobs (this feature requires completion of issue: risingwavelabs/risingwave#16520).
APPEND ONLY
Describes the concept of append-only streams.
Append-only streams can impact operator execution efficiency and state size. For example, the state for MAX()
and MIN()
operations.
A list should be provided to indicate whether each operator (SQL syntax) preserves the append-only property of the input. Example: DISTINCT ON
on an append-only stream still produces append-only output.
Watermark
Refer to the current documentation, which is relatively comprehensive.
EOWC can transfer a non-append only but with watermark stream to append only
state information for each operator (SQL syntax)
Can be categorized into three types: stateless, limited, and unbounded. Explain the differences in state for each operator under conditions of append-only/watermark.
Impact of workload on streaming performance
Cache locality
Amplification