Skip to content

Commit d818d47

Browse files
zhannngchenixzc
authored andcommitted
[doc](batch delete) address comment and translate en doc by LLM (apache#1863)
## Versions - [x] dev - [x] 3.0 - [x] 2.1 - [ ] 2.0 ## Languages - [x] Chinese - [x] English ## Docs Checklist - [x] Checked by AI - [x] Test Cases Built
1 parent 757b352 commit d818d47

File tree

6 files changed

+552
-441
lines changed

6 files changed

+552
-441
lines changed

docs/data-operate/delete/batch-delete-manual.md

Lines changed: 92 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
{
3-
"title": "Batch Deletion",
3+
"title": "Batch Deletion Based on Load",
44
"language": "en"
55
}
66
---
@@ -24,61 +24,124 @@ specific language governing permissions and limitations
2424
under the License.
2525
-->
2626

27-
Why do we need to introduce import-based Batch Delete when we have the Delete operation?
27+
## Batch Deletion Based on Load
2828

29-
- **Limitations of Delete operation**
29+
The delete operation is a special form of data update. In the primary key model (Unique Key) table, Doris supports deletion by adding a delete sign when loading data.
3030

31-
When you delete by Delete statement, each execution of Delete generates an empty rowset to record the deletion conditions and a new version of the data. Each time you read, you have to filter the deletion conditions. If you delete too often or have too many deletion conditions, it will seriously affect the query performance.
31+
Compared to the `DELETE` statement, using delete signs offers better usability and performance in the following scenarios:
3232

33-
- **Insert data interspersed with Delete data**
33+
1. **CDC Scenario**: When synchronizing data from an OLTP database to Doris, Insert and Delete operations in the binlog usually appear alternately. The `DELETE` statement cannot efficiently handle these operations. Using delete signs allows Insert and Delete operations to be processed uniformly, simplifying the CDC code for writing to Doris and improving data load and query performance.
34+
2. **Batch Deletion of Specified Primary Keys**: If a large number of primary keys need to be deleted, using the `DELETE` statement is inefficient. Each execution of `DELETE` generates an empty rowset to record the delete condition and produces a new data version. Frequent deletions or too many delete conditions can severely affect query performance.
3435

35-
For scenarios like importing data from a transactional database via CDC, Insert and Delete are usually interspersed in the data. In this case, the current Delete operation cannot be implemented.
36+
## Working Principle of Delete Signs
3637

37-
When importing data, there are several ways to merge it:
38+
### Principle Explanation
3839

39-
1. APPEND: Append all data to existing data.
40+
- **Table Structure**: The delete sign is stored as a hidden column `__DORIS_DELETE_SIGN__` in the primary key table. When the value of this column is 1, it indicates that the delete sign is effective.
41+
- **Data Load**: Users can specify the mapping condition of the delete sign column in the load task. The usage varies for different load tasks, as detailed in the syntax explanation below.
42+
- **Query**: During the query, Doris FE automatically adds the filter condition `__DORIS_DELETE_SIGN__ != true` in the query plan to filter out data with a delete sign value of 1.
43+
- **Data Compaction**: Doris's background data compaction periodically cleans up data with a delete sign value of 1.
4044

41-
2. DELETE: Delete all rows that have the same value as the key column of the imported data (when a `sequence` column exists in the table, it is necessary to satisfy the logic of having the same primary key as well as the size of the sequence column in order to delete it correctly, see Use Case 4 below for details).
45+
### Data Example
4246

43-
3. MERGE: APPEND or DELETE according to DELETE ON decision
47+
#### Table Structure
4448

45-
:::caution Warning
46-
Batch Delete only works on Unique models.
47-
:::
49+
Create an example table:
4850

49-
## Fundamental
51+
```sql
52+
CREATE TABLE example_table (
53+
id BIGINT NOT NULL,
54+
value STRING
55+
)
56+
UNIQUE KEY(id)
57+
DISTRIBUTED BY HASH(id) BUCKETS 10
58+
PROPERTIES (
59+
"replication_num" = "3"
60+
);
61+
```
62+
63+
Use the session variable `show_hidden_columns` to view hidden columns:
64+
65+
```sql
66+
mysql> set show_hidden_columns=true;
67+
68+
mysql> desc example_table;
69+
+-----------------------+---------+------+-------+---------+-------+
70+
| Field | Type | Null | Key | Default | Extra |
71+
+-----------------------+---------+------+-------+---------+-------+
72+
| id | bigint | No | true | NULL | |
73+
| value | text | Yes | false | NULL | NONE |
74+
| __DORIS_DELETE_SIGN__ | tinyint | No | false | 0 | NONE |
75+
| __DORIS_VERSION_COL__ | bigint | No | false | 0 | NONE |
76+
+-----------------------+---------+------+-------+---------+-------+
77+
```
5078

51-
This is achieved by adding a hidden column `__DORIS_DELETE_SIGN__` to the Unique table.
79+
#### Data Load
5280

53-
When FE parses the query, `__DORIS_DELETE_SIGN__` is removed when it encounters * and so on, and `__DORIS_DELETE_SIGN__ !` `= true`, BE will add a column for judgement when reading, and determine whether to delete by the condition.
81+
The table has the following existing data:
82+
83+
```sql
84+
+------+-------+
85+
| id | value |
86+
+------+-------+
87+
| 1 | foo |
88+
| 2 | bar |
89+
+------+-------+
90+
```
91+
92+
Insert a delete sign for id 1 (this is only for principle demonstration, not introducing various methods of using delete signs in load):
93+
94+
```sql
95+
mysql> insert into example_table (id, __DORIS_DELETE_SIGN__) values (1, 1);
96+
```
5497

55-
- Import
98+
#### Query
5699

57-
On import, the value of the hidden column is set to the value of the `DELETE ON` expression during the FE parsing stage.
100+
Directly view the data, and you can find that the record with id 1 has been deleted:
58101

59-
- Read
102+
```sql
103+
mysql> select * from example_table;
104+
+------+-------+
105+
| id | value |
106+
+------+-------+
107+
| 2 | bar |
108+
+------+-------+
109+
```
60110

61-
The read adds `__DORIS_DELETE_SIGN__ !` `= true` condition, BE does not sense this process and executes normally.
111+
Use the session variable `show_hidden_columns` to view hidden columns, and you can see that the row with id 1 has not been actually deleted. Its hidden column `__DORIS_DELETE_SIGN__` value is 1 and is filtered out during the query:
62112

63-
- Cumulative Compaction
113+
```sql
114+
mysql> set show_hidden_columns=true;
115+
mysql> select * from example_table;
116+
+------+-------+-----------------------+-----------------------+
117+
| id | value | __DORIS_DELETE_SIGN__ | __DORIS_VERSION_COL__ |
118+
+------+-------+-----------------------+-----------------------+
119+
| 1 | NULL | 1 | 3 |
120+
| 2 | bar | 0 | 2 |
121+
+------+-------+-----------------------+-----------------------+
122+
```
64123

65-
In Cumulative Compaction, hidden columns are treated as normal columns and the Compaction logic remains unchanged.
124+
## Syntax Explanation
66125

67-
- Base Compaction
126+
Different load types have different syntax for setting delete signs. Below are the usage syntax for delete signs in various load types.
68127

69-
When Base Compaction is performed, the rows marked for deletion are deleted to reduce the space occupied by the data.
128+
### Load Merge Type Selection
70129

71-
## Syntax Description
130+
There are several merge types when loading data:
72131

73-
The syntax design of the import is mainly to add a column mapping that specifies the field of the delete marker column, and it is necessary to add a column to the imported data. The syntax of various import methods is as follows:
132+
1. **APPEND**: All data is appended to the existing data.
133+
2. **DELETE**: Delete all rows with the same key column values as the loaded data.
134+
3. **MERGE**: Decide whether to APPEND or DELETE based on the DELETE ON condition.
74135

75136
### Stream Load
76137

77-
The writing method of `Stream Load` adds a field to set the delete label column in the columns field in the header. Example: `-H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"`
138+
The `Stream Load` syntax is to add a field for setting the delete sign column in the header's columns field, for example: `-H "columns: k1, k2, label_c3" -H "merge_type: [MERGE|APPEND|DELETE]" -H "delete: label_c3=1"`.
139+
140+
For usage examples of Stream Load, please refer to the "Specify merge_type for Delete Operation" and "Specify merge_type for Merge Operation" sections in the [Stream Load Manual](../load/load-way/stream-load-manual.md).
78141

79142
### Broker Load
80143

81-
The writing method of `Broker Load` sets the field of the delete marker column at `PROPERTIES`. The syntax is as follows:
144+
The `Broker Load` syntax is to set the delete sign column field in `PROPERTIES`, as follows:
82145

83146
```sql
84147
LOAD LABEL db1.label1
@@ -107,7 +170,7 @@ PROPERTIES
107170

108171
### Routine Load
109172

110-
The writing method of `Routine Load` adds a mapping to the `columns` field. The mapping method is the same as above. The syntax is as follows:
173+
The `Routine Load` syntax is to add a mapping in the `columns` field, with the same mapping method as above, as follows:
111174

112175
```sql
113176
CREATE ROUTINE LOAD example_db.test1 ON example_tbl
@@ -131,50 +194,3 @@ CREATE ROUTINE LOAD example_db.test1 ON example_tbl
131194
"kafka_offsets" = "101,0,0,200"
132195
);
133196
```
134-
135-
## Note
136-
137-
1. Since import operations other than stream load may be executed out of order inside doris, if it is not stream load when importing using the `MERGE` method, it needs to be used with load sequence. For the specific syntax, please refer to the `sequence` column related documents
138-
139-
2. `DELETE ON` condition can only be used with MERGE.
140-
141-
:::tip Tip
142-
if session variable `SET show_hidden_columns = true` was executed before running import task to show whether table support batch delete feature, then execute `select count(*) from xxx` statement in the same session after finishing `DELETE/MERGE` import task, it will result in a unexpected result that the statement result set will include the deleted results. To avoid this problem, you should execute `SET show_hidden_columns = false` before selecting statement or open a new session to run the select statement.
143-
:::
144-
145-
## Usage Examples
146-
147-
### Check if Batch Delete Support is Enabled
148-
149-
```sql
150-
mysql> CREATE TABLE IF NOT EXISTS table1 (
151-
-> siteid INT,
152-
-> citycode INT,
153-
-> username VARCHAR(64),
154-
-> pv BIGINT
155-
-> ) UNIQUE KEY (siteid, citycode, username)
156-
-> DISTRIBUTED BY HASH(siteid) BUCKETS 10
157-
-> PROPERTIES (
158-
-> "replication_num" = "3"
159-
-> );
160-
Query OK, 0 rows affected (0.34 sec)
161-
162-
mysql> SET show_hidden_columns=true;
163-
Query OK, 0 rows affected (0.00 sec)
164-
165-
mysql> DESC table1;
166-
+-----------------------+-------------+------+-------+---------+-------+
167-
| Field | Type | Null | Key | Default | Extra |
168-
+-----------------------+-------------+------+-------+---------+-------+
169-
| siteid | int | Yes | true | NULL | |
170-
| citycode | int | Yes | true | NULL | |
171-
| username | varchar(64) | Yes | true | NULL | |
172-
| pv | bigint | Yes | false | NULL | NONE |
173-
| __DORIS_DELETE_SIGN__ | tinyint | No | false | 0 | NONE |
174-
| __DORIS_VERSION_COL__ | bigint | No | false | 0 | NONE |
175-
+-----------------------+-------------+------+-------+---------+-------+
176-
6 rows in set (0.01 sec)
177-
```
178-
179-
### Stream Load Usage Examples
180-
Please refer to the sections "Specifying merge_type for DELETE operations" and "Specifying merge_type for MERGE operations" in the [Stream Load Manual](../import/import-way/stream-load-manual.md)

0 commit comments

Comments
 (0)