You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[doc](batch delete) address comment and translate en doc by LLM (apache#1863)
## Versions
- [x] dev
- [x] 3.0
- [x] 2.1
- [ ] 2.0
## Languages
- [x] Chinese
- [x] English
## Docs Checklist
- [x] Checked by AI
- [x] Test Cases Built
Copy file name to clipboardExpand all lines: docs/data-operate/delete/batch-delete-manual.md
+92-76Lines changed: 92 additions & 76 deletions
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
{
3
-
"title": "Batch Deletion",
3
+
"title": "Batch Deletion Based on Load",
4
4
"language": "en"
5
5
}
6
6
---
@@ -24,61 +24,124 @@ specific language governing permissions and limitations
24
24
under the License.
25
25
-->
26
26
27
-
Why do we need to introduce import-based Batch Delete when we have the Delete operation?
27
+
## Batch Deletion Based on Load
28
28
29
-
-**Limitations of Delete operation**
29
+
The delete operation is a special form of data update. In the primary key model (Unique Key) table, Doris supports deletion by adding a delete sign when loading data.
30
30
31
-
When you delete by Delete statement, each execution of Delete generates an empty rowset to record the deletion conditions and a new version of the data. Each time you read, you have to filter the deletion conditions. If you delete too often or have too many deletion conditions, it will seriously affect the query performance.
31
+
Compared to the `DELETE` statement, using delete signs offers better usability and performance in the following scenarios:
32
32
33
-
-**Insert data interspersed with Delete data**
33
+
1.**CDC Scenario**: When synchronizing data from an OLTP database to Doris, Insert and Delete operations in the binlog usually appear alternately. The `DELETE` statement cannot efficiently handle these operations. Using delete signs allows Insert and Delete operations to be processed uniformly, simplifying the CDC code for writing to Doris and improving data load and query performance.
34
+
2.**Batch Deletion of Specified Primary Keys**: If a large number of primary keys need to be deleted, using the `DELETE` statement is inefficient. Each execution of `DELETE` generates an empty rowset to record the delete condition and produces a new data version. Frequent deletions or too many delete conditions can severely affect query performance.
34
35
35
-
For scenarios like importing data from a transactional database via CDC, Insert and Delete are usually interspersed in the data. In this case, the current Delete operation cannot be implemented.
36
+
## Working Principle of Delete Signs
36
37
37
-
When importing data, there are several ways to merge it:
38
+
### Principle Explanation
38
39
39
-
1. APPEND: Append all data to existing data.
40
+
-**Table Structure**: The delete sign is stored as a hidden column `__DORIS_DELETE_SIGN__` in the primary key table. When the value of this column is 1, it indicates that the delete sign is effective.
41
+
-**Data Load**: Users can specify the mapping condition of the delete sign column in the load task. The usage varies for different load tasks, as detailed in the syntax explanation below.
42
+
-**Query**: During the query, Doris FE automatically adds the filter condition `__DORIS_DELETE_SIGN__ != true` in the query plan to filter out data with a delete sign value of 1.
43
+
-**Data Compaction**: Doris's background data compaction periodically cleans up data with a delete sign value of 1.
40
44
41
-
2. DELETE: Delete all rows that have the same value as the key column of the imported data (when a `sequence` column exists in the table, it is necessary to satisfy the logic of having the same primary key as well as the size of the sequence column in order to delete it correctly, see Use Case 4 below for details).
45
+
### Data Example
42
46
43
-
3. MERGE: APPEND or DELETE according to DELETE ON decision
47
+
#### Table Structure
44
48
45
-
:::caution Warning
46
-
Batch Delete only works on Unique models.
47
-
:::
49
+
Create an example table:
48
50
49
-
## Fundamental
51
+
```sql
52
+
CREATETABLEexample_table (
53
+
id BIGINTNOT NULL,
54
+
value STRING
55
+
)
56
+
UNIQUE KEY(id)
57
+
DISTRIBUTED BY HASH(id) BUCKETS 10
58
+
PROPERTIES (
59
+
"replication_num"="3"
60
+
);
61
+
```
62
+
63
+
Use the session variable `show_hidden_columns` to view hidden columns:
This is achieved by adding a hidden column `__DORIS_DELETE_SIGN__` to the Unique table.
79
+
#### Data Load
52
80
53
-
When FE parses the query, `__DORIS_DELETE_SIGN__` is removed when it encounters * and so on, and `__DORIS_DELETE_SIGN__ !``= true`, BE will add a column for judgement when reading, and determine whether to delete by the condition.
81
+
The table has the following existing data:
82
+
83
+
```sql
84
+
+------+-------+
85
+
| id | value |
86
+
+------+-------+
87
+
| 1 | foo |
88
+
| 2 | bar |
89
+
+------+-------+
90
+
```
91
+
92
+
Insert a delete sign for id 1 (this is only for principle demonstration, not introducing various methods of using delete signs in load):
93
+
94
+
```sql
95
+
mysql>insert into example_table (id, __DORIS_DELETE_SIGN__) values (1, 1);
96
+
```
54
97
55
-
- Import
98
+
#### Query
56
99
57
-
On import, the value of the hidden column is set to the value of the `DELETE ON` expression during the FE parsing stage.
100
+
Directly view the data, and you can find that the record with id 1 has been deleted:
58
101
59
-
- Read
102
+
```sql
103
+
mysql>select*from example_table;
104
+
+------+-------+
105
+
| id | value |
106
+
+------+-------+
107
+
| 2 | bar |
108
+
+------+-------+
109
+
```
60
110
61
-
The read adds `__DORIS_DELETE_SIGN__ !``= true` condition, BE does not sense this process and executes normally.
111
+
Use the session variable `show_hidden_columns` to view hidden columns, and you can see that the row with id 1 has not been actually deleted. Its hidden column `__DORIS_DELETE_SIGN__` value is 1 and is filtered out during the query:
In Cumulative Compaction, hidden columns are treated as normal columns and the Compaction logic remains unchanged.
124
+
## Syntax Explanation
66
125
67
-
- Base Compaction
126
+
Different load types have different syntax for setting delete signs. Below are the usage syntax for delete signs in various load types.
68
127
69
-
When Base Compaction is performed, the rows marked for deletion are deleted to reduce the space occupied by the data.
128
+
### Load Merge Type Selection
70
129
71
-
## Syntax Description
130
+
There are several merge types when loading data:
72
131
73
-
The syntax design of the import is mainly to add a column mapping that specifies the field of the delete marker column, and it is necessary to add a column to the imported data. The syntax of various import methods is as follows:
132
+
1.**APPEND**: All data is appended to the existing data.
133
+
2.**DELETE**: Delete all rows with the same key column values as the loaded data.
134
+
3.**MERGE**: Decide whether to APPEND or DELETE based on the DELETE ON condition.
74
135
75
136
### Stream Load
76
137
77
-
The writing method of `Stream Load` adds a field to set the delete label column in the columns field in the header. Example: `-H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"`
138
+
The `Stream Load` syntax is to add a field for setting the delete sign column in the header's columns field, for example: `-H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"`.
139
+
140
+
For usage examples of Stream Load, please refer to the "Specify merge_type for Delete Operation" and "Specify merge_type for Merge Operation" sections in the [Stream Load Manual](../load/load-way/stream-load-manual.md).
78
141
79
142
### Broker Load
80
143
81
-
The writing method of `Broker Load`sets the field of the delete marker column at `PROPERTIES`. The syntax is as follows:
144
+
The `Broker Load`syntax is to set the delete sign column field in `PROPERTIES`, as follows:
82
145
83
146
```sql
84
147
LOAD LABEL db1.label1
@@ -107,7 +170,7 @@ PROPERTIES
107
170
108
171
### Routine Load
109
172
110
-
The writing method of `Routine Load`adds a mapping to the `columns` field. The mapping method is the same as above. The syntax is as follows:
173
+
The `Routine Load`syntax is to add a mapping in the `columns` field, with the same mapping method as above, as follows:
@@ -131,50 +194,3 @@ CREATE ROUTINE LOAD example_db.test1 ON example_tbl
131
194
"kafka_offsets"="101,0,0,200"
132
195
);
133
196
```
134
-
135
-
## Note
136
-
137
-
1. Since import operations other than stream load may be executed out of order inside doris, if it is not stream load when importing using the `MERGE` method, it needs to be used with load sequence. For the specific syntax, please refer to the `sequence` column related documents
138
-
139
-
2.`DELETE ON` condition can only be used with MERGE.
140
-
141
-
:::tip Tip
142
-
if session variable `SET show_hidden_columns = true` was executed before running import task to show whether table support batch delete feature, then execute `select count(*) from xxx` statement in the same session after finishing `DELETE/MERGE` import task, it will result in a unexpected result that the statement result set will include the deleted results. To avoid this problem, you should execute `SET show_hidden_columns = false` before selecting statement or open a new session to run the select statement.
Please refer to the sections "Specifying merge_type for DELETE operations" and "Specifying merge_type for MERGE operations" in the [Stream Load Manual](../import/import-way/stream-load-manual.md)
0 commit comments