Skip to content

修复 HAVING 过滤不正确的问题 #281

@gongna-au

Description

@gongna-au

问题描述

在 Gaea 中,跨分片聚合在处理 HAVING 过滤条件时不正确。当执行带有 HAVING 子句的聚合操作时,过滤条件没有在所有分片上正确应用,导致结果不准确。

重现步骤

为验证Gaea是否在全局层重新应用了 HAVING 条件,设计一种场景,使分片执行的结果与全局层的结果不一致。

  1. namespace配置
{
    "open_general_log": true,
    "is_encrypt": true,
    "name": "test_namespace",
    "online": true,
    "read_only": false,
    "allowed_dbs": {
        "test": true
    },
    "default_phy_dbs": null,
    "slow_sql_time": "1000",
    "black_sql": [],
    "allowed_ip": null,
    "slices": [
        {
            "name": "slice-0",
            "user_name": "superroot",
            "password": "superroot",
            "master": "127.0.0.1:3319",
            "slaves": [],
            "statistic_slaves": [],
            "capacity": 1,
            "max_capacity": 1,
            "idle_timeout": 3600,
            "capability": 41479,
            "init_connect": "",
            "health_check_sql": ""
        },
        {
            "name": "slice-1",
            "user_name": "superroot",
            "password": "superroot",
            "master": "127.0.0.1:3349",
            "slaves": [],
            "statistic_slaves": [],
            "capacity": 1,
            "max_capacity": 1,
            "idle_timeout": 3600,
            "capability": 41479,
            "init_connect": "",
            "health_check_sql": ""
        }
    ],
    "shard_rules": [
        {
            "db": "test",
            "table": "example_table",
            "type": "mycat_long",
            "key": "id",
            "locations": [
                2,
                2
            ],
            "slices": [
                "slice-0",
                "slice-1"
            ],
            "databases": [
                "test_[0-3]"
            ],
            "partition_count": "4",
            "partition_length": "256"
        }
        
    ],
    "users": [
        {
            "user_name": "superroot",
            "password": "superroot",
            "namespace": "test_namespace",
            "rw_flag": 2,
            "rw_split": 1,
            "other_property": 0
        }
    ],
    "default_slice": "slice-0",
    "global_sequences": null,
    "default_charset": "",
    "default_collation": "",
    "max_sql_execute_time": 0,
    "max_sql_result_size": 0,
    "max_client_connections": 100000,
    "down_after_no_alive": 32,
    "seconds_behind_master": 32,
    "check_select_lock": false,
    "support_multi_query": false,
    "local_slave_read_priority": 0,
    "set_for_keep_session": false,
    "client_qps_limit": 0,
    "allowed_session_variables": {}
}
  1. 表结构
CREATE TABLE example_table (
    id INT NOT NULL,
    a VARCHAR(50) NOT NULL,
    b INT NOT NULL,
    PRIMARY KEY (id)
);
  1. 插入若干数据。

  2. 执行 SQLSELECT a AS c, SUM(LENGTH(a)) AS total_len FROM example_table HAVING total_len > 6

  3. 观察到 HAVING 过滤条件没有在跨分片后正确应用,结果不准确。

预期结果

mysql> SELECT a AS c, SUM(LENGTH(a)) AS total_len FROM example_table HAVING total_len > 6;
+---+-----------+
| c | total_len |
+---+-----------+
| a |        24 |
+---+-----------+
1 row in set (0.01 sec)

实际结果

SELECT a AS c, SUM(LENGTH(a)) AS total_len FROM example_table HAVING total_len > 6;

+----+-----------+
| c  | total_len |
+----+-----------+
| ab |         8 |
+----+-----------+
1 row in set (0.18 sec)

问题根源

在分库分表(Sharding)场景下,Gaea直接把带有 HAVING 的语句下推到每个分片执行,导致结果与预期不符。其根本原因是:HAVING 涉及的聚合操作(SUM、COUNT、AVG 等)无法简单地在各分片“独立计算”后再进行合并,否则就会出现“部分求和”的问题。

各分片独立执行后,只保留 total_len > 6 的结果,再合并到路由层,就会导致部分分片的聚合结果没有被正确“累加”。举个简单例子,如果某个分片里 SUM(LENGTH(a)) 的值只有 3,没有满足 6 就被过滤掉了;但其实如果所有分片加在一起,SUM(LENGTH(a)) 可能会大于 6,原本应该保留这一行。

版本信息
所有版本

问题标签: bug, enhancement, sharding, feature-request

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions