Skip to content

Commit 0ca9f53

Browse files
committed
typo(docs): fix some docs problem
+ `ln -s <doris_meta_original> <doris_meta_created>` wrong usage. ```text Usage: ln [OPTION]... [-T] TARGET LINK_NAME or: ln [OPTION]... TARGET or: ln [OPTION]... TARGET... DIRECTORY or: ln [OPTION]... -t DIRECTORY TARGET... In the 1st form, create a link to TARGET with the name LINK_NAME. In the 2nd form, create a link to TARGET in the current directory. In the 3rd and 4th forms, create links to each TARGET in DIRECTORY. Create hard links by default, symbolic links with --symbolic. By default, each destination (name of new link) should not already exist. When creating hard links, each TARGET must exist. Symbolic links can hold arbitrary text; if later resolved, a relative link is interpreted in relation to its parent directory. ``` + `print(adbc_driver_flightsql.__version__)` throw error. + fix `Can not found function 'sum_merge'` + fix `Can not found function 'sum_union'` + fix ```errCode = 2, detailMessage = Table testdb.test_partition is not a dynamic partition table. Use command `HELP ALTER TABLE` to see how to change a normal table to a dynamic partition table.``` + docs change `SHOW INDEX FROM idx_name` into `SHOW INDEX FROM table_name` + fix `errCode = 2, detailMessage = Can not drop key column when table has value column with REPLACE aggregation method`
1 parent 7de452b commit 0ca9f53

File tree

87 files changed

+300
-263
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

87 files changed

+300
-263
lines changed

blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -29,8 +29,7 @@ specific language governing permissions and limitations
2929
under the License.
3030
-->
3131

32-
33-
For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include **Arrow Flight SQL protocol** into [Apache Doris 2.1](https://doris.apache.org), which provides **tens-fold speedups for data transfer**.
32+
For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include **Arrow Flight SQL protocol** into [Apache Doris 2.1](https://doris.apache.org), which provides **tens-fold speedups for data transfer**.
3433

3534
:::tip Tip
3635
A [demo](https://www.youtube.com/watch?v=zIqy24gI8DE) of loading data from Apache Doris to Python using Arrow Flight SQL.
@@ -42,25 +41,23 @@ As a column-oriented data warehouse, Apache Doris arranges its query results in
4241

4342
Apache Doris 2.1 has a data transmission channel built on [Arrow Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html). ([Apache Arrow](https://arrow.apache.org/) is a software development platform designed for high data movement efficiency across systems and languages, and the Arrow format aims for high-performance, lossless data exchange.) It allows **high-speed, large-scale data reading from Doris via SQL in various mainstream programming languages**. For target clients that also support the Arrow format, the whole process will be free of serialization/deserialization, thus no performance loss. Another upside is, Arrow Flight can make full use of multi-node and multi-core architecture and implement parallel data transfer, which is another enabler of high data throughput.
4443

45-
For example, if a Python client reads data from Apache Doris, Doris will first convert the column-oriented Blocks to Arrow RecordBatch. Then in the Python client, Arrow RecordBatch will be converted to Pandas DataFrame. Both conversions are fast because the Doris Blocks, Arrow RecordBatch, and Pandas DataFrame are all column-oriented.
44+
For example, if a Python client reads data from Apache Doris, Doris will first convert the column-oriented Blocks to Arrow RecordBatch. Then in the Python client, Arrow RecordBatch will be converted to Pandas DataFrame. Both conversions are fast because the Doris Blocks, Arrow RecordBatch, and Pandas DataFrame are all column-oriented.
4645

4746
![img](/images/high-speed-data-transfer-based-on-doris-arrow-flight-sql.png)
4847

49-
50-
In addition, Arrow Flight SQL provides a general JDBC driver to facilitate seamless communication between databases that supports the Arrow Flight SQL protocol. This unlocks the the potential of Doris to be connected to a wider ecosystem and to be used in more cases.
48+
In addition, Arrow Flight SQL provides a general JDBC driver to facilitate seamless communication between databases that supports the Arrow Flight SQL protocol. This unlocks the the potential of Doris to be connected to a wider ecosystem and to be used in more cases.
5149

5250
## Performance test
5351

5452
The "tens-fold speedups" conclusion is based on our benchmark tests. We tried reading data from Doris using PyMySQL, Pandas, and Arrow Flight SQL, and jotted down the durations, respectively. The test data is the ClickBench dataset.
5553

5654
![Performance test](/images/doris-performance-test.png)
5755

58-
59-
Results on various data types are as follows:
56+
Results on various data types are as follows:
6057

6158
![Performance test results](/images/doris-performance-test-2.png)
6259

63-
**As shown, Arrow Flight SQL outperforms PyMySQL and Pandas in all data types by a factor ranging from 20 to several hundreds**.
60+
**As shown, Arrow Flight SQL outperforms PyMySQL and Pandas in all data types by a factor ranging from 20 to several hundreds**.
6461

6562
![Arrow Flight SQL outperforms PyMySQL and Pandas](/images/doris-performance-test-3.png)
6663

@@ -70,17 +67,18 @@ With support for Arrow Flight SQL, Apache Doris can leverage the Python ADBC Dri
7067

7168
### 01 Install library
7269

73-
The relevant library is already published on PyPI. It can be installed simply as follows:
70+
The relevant library is already published on PyPI. It can be installed simply as follows:
7471

7572
```C++
7673
pip install adbc_driver_manager
7774
pip install adbc_driver_flightsql
7875
```
7976

80-
Import the following module/library to interact with the installed library:
77+
Import the following module/library to interact with the installed library:
8178

8279
```Python
8380
import adbc_driver_manager
81+
import adbc_driver_flightsql
8482
import adbc_driver_flightsql.dbapi as flight_sql
8583

8684
>>> print(adbc_driver_manager.__version__)
@@ -95,9 +93,9 @@ Create a client for interacting with the Doris Arrow Flight SQL service. Prerequ
9593

9694
Configure parameters for Doris frontend (FE) and backend (BE):
9795

98-
- In `fe/conf/fe.conf`, set `arrow_flight_sql_port ` to an available port, such as 9090.
96+
- In `fe/conf/fe.conf`, set `arrow_flight_sql_port` to an available port, such as 9090.
9997

100-
- In `be/conf/be.conf`, set `arrow_flight_sql_port ` to an available port, such as 9091.
98+
- In `be/conf/be.conf`, set `arrow_flight_sql_port` to an available port, such as 9091.
10199

102100
`Note: The arrow_flight_sql_port port number configured in fe.conf and be.conf is different`
103101

@@ -370,7 +368,7 @@ dbapi_adbc_execute_fetch_df()
370368
dbapi_adbc_execute_partitions()
371369
```
372370

373-
Results are as follows (omitting the repeated outputs). **It only takes 3s** to load a Clickbench dataset containing 1 million rows and 105 columns.
371+
Results are as follows (omitting the repeated outputs). **It only takes 3s** to load a Clickbench dataset containing 1 million rows and 105 columns.
374372

375373
```Python
376374
##################
@@ -399,9 +397,9 @@ None
399397

400398
### 02 JDBC
401399

402-
The open-source JDBC driver for the Arrow Flight SQL protocol provides compatibility with the standard JDBC API. It allows most BI tools to access Doris via JDBC and supports high-speed transfer of Apache Arrow data.
400+
The open-source JDBC driver for the Arrow Flight SQL protocol provides compatibility with the standard JDBC API. It allows most BI tools to access Doris via JDBC and supports high-speed transfer of Apache Arrow data.
403401

404-
Usage of this driver is similar to using that for the MySQL protocol. You just need to replace `jdbc:mysql` in the connection URL with `jdbc:arrow-flight-sql`. The returned result will be in the JDBC ResultSet data structure.
402+
Usage of this driver is similar to using that for the MySQL protocol. You just need to replace `jdbc:mysql` in the connection URL with `jdbc:arrow-flight-sql`. The returned result will be in the JDBC ResultSet data structure.
405403

406404
```Java
407405
import java.sql.Connection;
@@ -458,6 +456,6 @@ For Spark users, apart from connecting to Flight SQL Server using JDBC and JAVA,
458456

459457
## Hop on the trend train
460458

461-
A number of enterprise users of Doris has tried loading data from Doris to Python, Spark, and Flink using Arrow Flight SQL and enjoyed much faster data reading speed. In the future, we plan to include the support for Arrow Flight SQL in data writing, too. By then, most systems built with mainstream programming languages will be able to read and write data from/to Apache Doris by an ADBC client. That's high-speed data interaction which opens up numerous possibilities. On our to-do list, we also envision leveraging Arrow Flight to implement parallel data reading by multiple backends and facilitate federated queries across Doris and Spark.
459+
A number of enterprise users of Doris has tried loading data from Doris to Python, Spark, and Flink using Arrow Flight SQL and enjoyed much faster data reading speed. In the future, we plan to include the support for Arrow Flight SQL in data writing, too. By then, most systems built with mainstream programming languages will be able to read and write data from/to Apache Doris by an ADBC client. That's high-speed data interaction which opens up numerous possibilities. On our to-do list, we also envision leveraging Arrow Flight to implement parallel data reading by multiple backends and facilitate federated queries across Doris and Spark.
462460

463-
Download [Apache Doris 2.1](https://doris.apache.org/download/) and get a taste of 100 times faster data transfer powered by Arrow Flight SQL. If you need assistance, come find us in the [Apache Doris developer and user community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw).
461+
Download [Apache Doris 2.1](https://doris.apache.org/download/) and get a taste of 100 times faster data transfer powered by Arrow Flight SQL. If you need assistance, come find us in the [Apache Doris developer and user community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw).

docs/db-connect/arrow-flight-sql-connect.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,6 @@ In Doris, query results are organized in columnar format as Blocks. In versions
3434

3535
To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).
3636

37-
3837
## Python Usage
3938

4039
Use Python's ADBC ​​Driver to connect to Doris to achieve extremely fast data reading. The following steps use Python (version >= 3.9) ADBC ​​Driver to perform a series of common database syntax operations, including DDL, DML, setting Session variables, and Show statements.
@@ -52,6 +51,7 @@ Import the following modules/libraries in the code to use the installed Library:
5251

5352
```Python
5453
import adbc_driver_manager
54+
import adbc_driver_flightsql
5555
import adbc_driver_flightsql.dbapi as flight_sql
5656

5757
>>> print(adbc_driver_manager.__version__)
@@ -281,6 +281,7 @@ cursor.close()
281281
The open source JDBC driver of Arrow Flight SQL protocol is compatible with the standard JDBC API, which can be used by most BI tools to access Doris through JDBC and supports high-speed transmission of Apache Arrow data. The usage is similar to connecting to Doris through the JDBC driver of MySQL protocol. You only need to replace the jdbc:mysql protocol in the link URL with the jdbc:arrow-flight-sql protocol. The query results are still returned in the JDBC ResultSet data structure.
282282

283283
POM dependency:
284+
284285
```Java
285286
<properties>
286287
<arrow.version>17.0.0</arrow.version>
@@ -325,6 +326,7 @@ conn.close();
325326
In addition to using JDBC, similar to Python, JAVA can also create a Driver to read Doris and return data in Arrow format. The following are how to use AdbcDriver and JdbcDriver to connect to Doris Arrow Flight Server.
326327

327328
POM dependency:
329+
328330
```Java
329331
<properties>
330332
<adbc.version>0.12.0</adbc.version>

docs/install/deploy-manually/integrated-storage-compute-deploy-manually.md

Lines changed: 12 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ The integrated storage-compute architecture is shown below, and the deployment o
3030
[integrated-storage-compute-architecture](/images/getting-started/apache-doris-technical-overview.png)
3131

3232
1. **Deploy FE Master Node**: Deploy the first FE node as the Master node;
33-
33+
3434
2. **Deploy FE Cluster**: Deploy the FE cluster by adding Follower or Observer FE nodes;
35-
35+
3636
3. **Deploy BE Nodes**: Register BE nodes to the FE cluster;
37-
37+
3838
4. **Verify Cluster Correctness**: After deployment, connect to and verify the cluster's correctness.
3939

4040
## Step 1: Deploy FE Master Node
@@ -44,13 +44,16 @@ The integrated storage-compute architecture is shown below, and the deployment o
4444
When deploying FE, it is recommended to store metadata on a different hard drive from the BE node data storage.
4545

4646
When extracting the installation package, a doris-meta directory is included by default. It is recommended to create a separate metadata directory and link it to the doris-meta directory. In production, it's highly advised to use a separate directory outside the Doris installation folder, preferably on an SSD. For testing and development environments, you can use the default configuration.
47-
47+
4848
```sql
4949
## Use a separate disk for FE metadata
5050
mkdir -p <doris_meta_created>
5151

5252
## Create FE metadata directory symlink
53-
ln -s <doris_meta_original> <doris_meta_created>
53+
rm -rf <doris_meta_original>
54+
55+
ln -s <doris_meta_created> <doris_meta_original>
56+
5457
```
5558

5659
2. **Modify FE Configuration File**
@@ -72,16 +75,16 @@ The integrated storage-compute architecture is shown below, and the deployment o
7275
## modify Java Home
7376
JAVA_HOME = <your-java-home-path>
7477
```
75-
78+
7679
Parameter Descriptions: For more details, refer to the [FE Configuration](../../admin-manual/config/fe-config)
7780

7881
| Parameter | Suggestion |
7982
| ------------------------------------------------------------ | --------------------------------------------------------- |
8083
| JAVA_OPTS | Specify the `-Xmx` parameter to adjust the Java Heap. It is recommended to set it to above 16G in production environments. |
81-
| [lower_case_table_names ](../../admin-manual/config/fe-config#lower_case_table_names) | Set case sensitivity. It is recommended to adjust it to 1, meaning case-insensitive. |
82-
| [priority_networks ](../../admin-manual/config/fe-config#priority_networks) | Network CIDR is specified based on the network IP address. It can be ignored in an FQDN environment. |
84+
| [lower_case_table_names](../../admin-manual/config/fe-config#lower_case_table_names) | Set case sensitivity. It is recommended to adjust it to 1, meaning case-insensitive. |
85+
| [priority_networks](../../admin-manual/config/fe-config#priority_networks) | Network CIDR is specified based on the network IP address. It can be ignored in an FQDN environment. |
8386
| JAVA_HOME | It is recommended to use a JDK environment independent of the operating system for Doris. |
84-
87+
8588
3. **Start FE Process**
8689

8790
You can start the FE process using the following command:
@@ -233,7 +236,6 @@ In production, it is recommended to deploy at least 3 nodes. After deploying the
233236

234237
- `TabletNum` represents the number of shards on the node. Newly added nodes will undergo data balancing, and the `TabletNum` will gradually become more evenly distributed.
235238

236-
237239
## Step 4: Verify Cluster Integrity
238240

239241
1. **Log in to the Database**

docs/table-design/data-model/aggregate.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -179,7 +179,7 @@ select group_concat_merge(v2) from aggstate;
179179
If you do not want the final aggregation result, you can use `union` to combine multiple intermediate aggregation results and generate a new intermediate result.
180180

181181
```sql
182-
insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate;
182+
insert into aggstate select 3,sum(v1),group_concat_union(v2) from aggstate;
183183
```
184184

185185
The calculations in the table are as follows:
@@ -189,16 +189,16 @@ The calculations in the table are as follows:
189189
The query result is as follows:
190190

191191
```sql
192-
mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate;
192+
mysql> select sum(v1), group_concat_merge(v2) from aggstate;
193193
+---------------+------------------------+
194-
| sum_merge(k2) | group_concat_merge(k3) |
194+
| sum(v1) | group_concat_merge(v2) |
195195
+---------------+------------------------+
196196
| 20 | c,b,a,d,c,b,a,d |
197197
+---------------+------------------------+
198198

199-
mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2;
199+
mysql> select sum(v1), group_concat_merge(v2) from aggstate where k1 != 2;
200200
+---------------+------------------------+
201-
| sum_merge(k2) | group_concat_merge(k3) |
201+
| sum(v1) | group_concat_merge(v2) |
202202
+---------------+------------------------+
203203
| 16 | c,b,a,d,c,b,a |
204204
+---------------+------------------------+

docs/table-design/data-partitioning/dynamic-partitioning.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,12 +86,13 @@ When using the ALTER TABLE statement to modify dynamic partitioning, the changes
8686
In the example below, the ALTER TABLE statement is used to modify a non-dynamic partitioned table to a dynamic partitioned table:
8787

8888
```sql
89-
CREATE TABLE test_dynamic_partition(
89+
CREATE TABLE test_partition(
9090
order_id BIGINT,
9191
create_dt DATE,
9292
username VARCHAR(20)
9393
)
9494
DUPLICATE KEY(order_id)
95+
PARTITION BY RANGE(create_dt) ()
9596
DISTRIBUTED BY HASH(order_id) BUCKETS 10;
9697

9798
ALTER TABLE test_partition SET (

docs/table-design/index/inverted-index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -264,7 +264,7 @@ ALTER TABLE table_name DROP INDEX idx_name;
264264
SHOW CREATE TABLE table_name;
265265

266266
-- Syntax 2: IndexType as INVERTED indicates an inverted index
267-
SHOW INDEX FROM idx_name;
267+
SHOW INDEX FROM table_name;
268268

269269
## Using Indexes
270270

@@ -398,7 +398,7 @@ PROPERTIES ("replication_num" = "1");
398398
```
399399
wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz
400400
401-
curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
401+
curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz -XPUT http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
402402
{
403403
"TxnId": 2,
404404
"Label": "a8a3e802-2329-49e8-912b-04c800a461a6",

docs/table-design/index/ngram-bloomfilter-index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ SHOW CREATE TABLE table_name;
7474

7575
-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index
7676
```sql
77-
SHOW INDEX FROM idx_name;
77+
SHOW INDEX FROM table_name;
7878
```
7979

8080
### Deleting an NGram BloomFilter Index

docs/table-design/schema-change.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -266,7 +266,7 @@ MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;
266266

267267
Note: Whether modifying a key column or a value column, the complete column information must be declared.
268268

269-
3. Modify the maximum length of the `val1` column in the base table. The original `val1` was (val1 VARCHAR(32) REPLACE DEFAULT "abc")
269+
3. Modify the maximum length of the `val5` column in the base table. The original `val5` was (val5 VARCHAR(32) REPLACE DEFAULT "abc")
270270

271271
```sql
272272
ALTER TABLE example_db.my_table

docs/table-design/tiered-storage/remote-storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,7 @@ To optimize query performance and save object storage resources, local Cache has
203203

204204
- The Cache is managed through LRU and does not support TTL.
205205

206-
For specific configurations, please refer to (../../lakehouse/filecache).
206+
For specific configurations, please refer to [Data Cache](../../lakehouse/filecache).
207207

208208
## FAQ
209209

i18n/zh-CN/docusaurus-plugin-content-docs/current/admin-manual/auth/authentication-and-authorization.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -129,9 +129,9 @@ Doris 支持以下密码策略,可以帮助用户更好的进行密码管理
129129
- 设置用户属性:[SET PROPERTY](../../sql-manual/sql-statements/account-management/SET-PROPERTY)
130130
- 查看用户属性:[SHOW PROPERTY](../../sql-manual/sql-statements/account-management/SHOW-PROPERTY)
131131
- 修改密码:[SET PASSWORD](../../sql-manual/sql-statements/account-management/SET-PASSWORD)
132-
- 查看支持的所有权限项:[SHOW PRIVILEGES]
133-
- 查看行权限策略 [SHOW ROW POLICY]
134-
- 创建行权限策略 [CREATE ROW POLICY]
132+
- 查看支持的所有权限项:[SHOW PRIVILEGES](../../../../sql-manual/sql-statements/account-management/SHOW-PRIVILEGES)
133+
- 查看行权限策略[SHOW ROW POLICY](../../../../sql-manual/sql-statements/data-governance/SHOW-ROW-POLICY)
134+
- 创建行权限策略[CREATE ROW POLICY](../../../../sql-manual/sql-statements/data-governance/CREATE-ROW-POLICY)
135135

136136
### 权限类型
137137

0 commit comments

Comments
 (0)