Skip to content

typo(docs): fix some docs problem #2227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,7 @@ specific language governing permissions and limitations
under the License.
-->


For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include **Arrow Flight SQL protocol** into [Apache Doris 2.1](https://doris.apache.org), which provides **tens-fold speedups for data transfer**.
For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include **Arrow Flight SQL protocol** into [Apache Doris 2.1](https://doris.apache.org), which provides **tens-fold speedups for data transfer**.

:::tip Tip
A [demo](https://www.youtube.com/watch?v=zIqy24gI8DE) of loading data from Apache Doris to Python using Arrow Flight SQL.
Expand All @@ -42,25 +41,23 @@ As a column-oriented data warehouse, Apache Doris arranges its query results in

Apache Doris 2.1 has a data transmission channel built on [Arrow Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html). ([Apache Arrow](https://arrow.apache.org/) is a software development platform designed for high data movement efficiency across systems and languages, and the Arrow format aims for high-performance, lossless data exchange.) It allows **high-speed, large-scale data reading from Doris via SQL in various mainstream programming languages**. For target clients that also support the Arrow format, the whole process will be free of serialization/deserialization, thus no performance loss. Another upside is, Arrow Flight can make full use of multi-node and multi-core architecture and implement parallel data transfer, which is another enabler of high data throughput.

For example, if a Python client reads data from Apache Doris, Doris will first convert the column-oriented Blocks to Arrow RecordBatch. Then in the Python client, Arrow RecordBatch will be converted to Pandas DataFrame. Both conversions are fast because the Doris Blocks, Arrow RecordBatch, and Pandas DataFrame are all column-oriented.
For example, if a Python client reads data from Apache Doris, Doris will first convert the column-oriented Blocks to Arrow RecordBatch. Then in the Python client, Arrow RecordBatch will be converted to Pandas DataFrame. Both conversions are fast because the Doris Blocks, Arrow RecordBatch, and Pandas DataFrame are all column-oriented.

![img](/images/high-speed-data-transfer-based-on-doris-arrow-flight-sql.png)


In addition, Arrow Flight SQL provides a general JDBC driver to facilitate seamless communication between databases that supports the Arrow Flight SQL protocol. This unlocks the the potential of Doris to be connected to a wider ecosystem and to be used in more cases.
In addition, Arrow Flight SQL provides a general JDBC driver to facilitate seamless communication between databases that supports the Arrow Flight SQL protocol. This unlocks the the potential of Doris to be connected to a wider ecosystem and to be used in more cases.

## Performance test

The "tens-fold speedups" conclusion is based on our benchmark tests. We tried reading data from Doris using PyMySQL, Pandas, and Arrow Flight SQL, and jotted down the durations, respectively. The test data is the ClickBench dataset.

![Performance test](/images/doris-performance-test.png)


Results on various data types are as follows:
Results on various data types are as follows:

![Performance test results](/images/doris-performance-test-2.png)

**As shown, Arrow Flight SQL outperforms PyMySQL and Pandas in all data types by a factor ranging from 20 to several hundreds**.
**As shown, Arrow Flight SQL outperforms PyMySQL and Pandas in all data types by a factor ranging from 20 to several hundreds**.

![Arrow Flight SQL outperforms PyMySQL and Pandas](/images/doris-performance-test-3.png)

Expand All @@ -70,17 +67,18 @@ With support for Arrow Flight SQL, Apache Doris can leverage the Python ADBC Dri

### 01 Install library

The relevant library is already published on PyPI. It can be installed simply as follows:
The relevant library is already published on PyPI. It can be installed simply as follows:

```C++
pip install adbc_driver_manager
pip install adbc_driver_flightsql
```

Import the following module/library to interact with the installed library:
Import the following module/library to interact with the installed library:

```Python
import adbc_driver_manager
import adbc_driver_flightsql
import adbc_driver_flightsql.dbapi as flight_sql

>>> print(adbc_driver_manager.__version__)
Expand All @@ -95,9 +93,9 @@ Create a client for interacting with the Doris Arrow Flight SQL service. Prerequ

Configure parameters for Doris frontend (FE) and backend (BE):

- In `fe/conf/fe.conf`, set `arrow_flight_sql_port ` to an available port, such as 9090.
- In `fe/conf/fe.conf`, set `arrow_flight_sql_port` to an available port, such as 9090.

- In `be/conf/be.conf`, set `arrow_flight_sql_port ` to an available port, such as 9091.
- In `be/conf/be.conf`, set `arrow_flight_sql_port` to an available port, such as 9091.

`Note: The arrow_flight_sql_port port number configured in fe.conf and be.conf is different`

Expand Down Expand Up @@ -370,7 +368,7 @@ dbapi_adbc_execute_fetch_df()
dbapi_adbc_execute_partitions()
```

Results are as follows (omitting the repeated outputs). **It only takes 3s** to load a Clickbench dataset containing 1 million rows and 105 columns.
Results are as follows (omitting the repeated outputs). **It only takes 3s** to load a Clickbench dataset containing 1 million rows and 105 columns.

```Python
##################
Expand Down Expand Up @@ -399,9 +397,9 @@ None

### 02 JDBC

The open-source JDBC driver for the Arrow Flight SQL protocol provides compatibility with the standard JDBC API. It allows most BI tools to access Doris via JDBC and supports high-speed transfer of Apache Arrow data.
The open-source JDBC driver for the Arrow Flight SQL protocol provides compatibility with the standard JDBC API. It allows most BI tools to access Doris via JDBC and supports high-speed transfer of Apache Arrow data.

Usage of this driver is similar to using that for the MySQL protocol. You just need to replace `jdbc:mysql` in the connection URL with `jdbc:arrow-flight-sql`. The returned result will be in the JDBC ResultSet data structure.
Usage of this driver is similar to using that for the MySQL protocol. You just need to replace `jdbc:mysql` in the connection URL with `jdbc:arrow-flight-sql`. The returned result will be in the JDBC ResultSet data structure.

```Java
import java.sql.Connection;
Expand Down Expand Up @@ -458,6 +456,6 @@ For Spark users, apart from connecting to Flight SQL Server using JDBC and JAVA,

## Hop on the trend train

A number of enterprise users of Doris has tried loading data from Doris to Python, Spark, and Flink using Arrow Flight SQL and enjoyed much faster data reading speed. In the future, we plan to include the support for Arrow Flight SQL in data writing, too. By then, most systems built with mainstream programming languages will be able to read and write data from/to Apache Doris by an ADBC client. That's high-speed data interaction which opens up numerous possibilities. On our to-do list, we also envision leveraging Arrow Flight to implement parallel data reading by multiple backends and facilitate federated queries across Doris and Spark.
A number of enterprise users of Doris has tried loading data from Doris to Python, Spark, and Flink using Arrow Flight SQL and enjoyed much faster data reading speed. In the future, we plan to include the support for Arrow Flight SQL in data writing, too. By then, most systems built with mainstream programming languages will be able to read and write data from/to Apache Doris by an ADBC client. That's high-speed data interaction which opens up numerous possibilities. On our to-do list, we also envision leveraging Arrow Flight to implement parallel data reading by multiple backends and facilitate federated queries across Doris and Spark.

Download [Apache Doris 2.1](https://doris.apache.org/download/) and get a taste of 100 times faster data transfer powered by Arrow Flight SQL. If you need assistance, come find us in the [Apache Doris developer and user community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw).
Download [Apache Doris 2.1](https://doris.apache.org/download/) and get a taste of 100 times faster data transfer powered by Arrow Flight SQL. If you need assistance, come find us in the [Apache Doris developer and user community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw).
4 changes: 3 additions & 1 deletion docs/db-connect/arrow-flight-sql-connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ In Doris, query results are organized in columnar format as Blocks. In versions

To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).


## Python Usage

Use Python's ADBC ​​Driver to connect to Doris to achieve extremely fast data reading. The following steps use Python (version >= 3.9) ADBC ​​Driver to perform a series of common database syntax operations, including DDL, DML, setting Session variables, and Show statements.
Expand All @@ -52,6 +51,7 @@ Import the following modules/libraries in the code to use the installed Library:

```Python
import adbc_driver_manager
import adbc_driver_flightsql
import adbc_driver_flightsql.dbapi as flight_sql

>>> print(adbc_driver_manager.__version__)
Expand Down Expand Up @@ -281,6 +281,7 @@ cursor.close()
The open source JDBC driver of Arrow Flight SQL protocol is compatible with the standard JDBC API, which can be used by most BI tools to access Doris through JDBC and supports high-speed transmission of Apache Arrow data. The usage is similar to connecting to Doris through the JDBC driver of MySQL protocol. You only need to replace the jdbc:mysql protocol in the link URL with the jdbc:arrow-flight-sql protocol. The query results are still returned in the JDBC ResultSet data structure.

POM dependency:

```Java
<properties>
<arrow.version>17.0.0</arrow.version>
Expand Down Expand Up @@ -325,6 +326,7 @@ conn.close();
In addition to using JDBC, similar to Python, JAVA can also create a Driver to read Doris and return data in Arrow format. The following are how to use AdbcDriver and JdbcDriver to connect to Doris Arrow Flight Server.

POM dependency:

```Java
<properties>
<adbc.version>0.12.0</adbc.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,11 +30,11 @@ The integrated storage-compute architecture is shown below, and the deployment o
[integrated-storage-compute-architecture](/images/getting-started/apache-doris-technical-overview.png)

1. **Deploy FE Master Node**: Deploy the first FE node as the Master node;

2. **Deploy FE Cluster**: Deploy the FE cluster by adding Follower or Observer FE nodes;

3. **Deploy BE Nodes**: Register BE nodes to the FE cluster;

4. **Verify Cluster Correctness**: After deployment, connect to and verify the cluster's correctness.

## Step 1: Deploy FE Master Node
Expand All @@ -44,13 +44,16 @@ The integrated storage-compute architecture is shown below, and the deployment o
When deploying FE, it is recommended to store metadata on a different hard drive from the BE node data storage.

When extracting the installation package, a doris-meta directory is included by default. It is recommended to create a separate metadata directory and link it to the doris-meta directory. In production, it's highly advised to use a separate directory outside the Doris installation folder, preferably on an SSD. For testing and development environments, you can use the default configuration.

```sql
## Use a separate disk for FE metadata
mkdir -p <doris_meta_created>

## Create FE metadata directory symlink
ln -s <doris_meta_original> <doris_meta_created>
rm -rf <doris_meta_original>

ln -s <doris_meta_created> <doris_meta_original>

```

2. **Modify FE Configuration File**
Expand All @@ -72,16 +75,16 @@ The integrated storage-compute architecture is shown below, and the deployment o
## modify Java Home
JAVA_HOME = <your-java-home-path>
```

Parameter Descriptions: For more details, refer to the [FE Configuration](../../admin-manual/config/fe-config):

| Parameter | Suggestion |
| ------------------------------------------------------------ | --------------------------------------------------------- |
| JAVA_OPTS | Specify the `-Xmx` parameter to adjust the Java Heap. It is recommended to set it to above 16G in production environments. |
| [lower_case_table_names ](../../admin-manual/config/fe-config#lower_case_table_names) | Set case sensitivity. It is recommended to adjust it to 1, meaning case-insensitive. |
| [priority_networks ](../../admin-manual/config/fe-config#priority_networks) | Network CIDR is specified based on the network IP address. It can be ignored in an FQDN environment. |
| [lower_case_table_names](../../admin-manual/config/fe-config#lower_case_table_names) | Set case sensitivity. It is recommended to adjust it to 1, meaning case-insensitive. |
| [priority_networks](../../admin-manual/config/fe-config#priority_networks) | Network CIDR is specified based on the network IP address. It can be ignored in an FQDN environment. |
| JAVA_HOME | It is recommended to use a JDK environment independent of the operating system for Doris. |

3. **Start FE Process**

You can start the FE process using the following command:
Expand Down Expand Up @@ -233,7 +236,6 @@ In production, it is recommended to deploy at least 3 nodes. After deploying the

- `TabletNum` represents the number of shards on the node. Newly added nodes will undergo data balancing, and the `TabletNum` will gradually become more evenly distributed.


## Step 4: Verify Cluster Integrity

1. **Log in to the Database**
Expand Down
10 changes: 5 additions & 5 deletions docs/table-design/data-model/aggregate.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,7 @@ select group_concat_merge(v2) from aggstate;
If you do not want the final aggregation result, you can use `union` to combine multiple intermediate aggregation results and generate a new intermediate result.

```sql
insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate;
insert into aggstate select 3,sum(v1),group_concat_union(v2) from aggstate;
```

The calculations in the table are as follows:
Expand All @@ -189,16 +189,16 @@ The calculations in the table are as follows:
The query result is as follows:

```sql
mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate;
mysql> select sum(v1), group_concat_merge(v2) from aggstate;
+---------------+------------------------+
| sum_merge(k2) | group_concat_merge(k3) |
| sum(v1) | group_concat_merge(v2) |
+---------------+------------------------+
| 20 | c,b,a,d,c,b,a,d |
+---------------+------------------------+

mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2;
mysql> select sum(v1), group_concat_merge(v2) from aggstate where k1 != 2;
+---------------+------------------------+
| sum_merge(k2) | group_concat_merge(k3) |
| sum(v1) | group_concat_merge(v2) |
+---------------+------------------------+
| 16 | c,b,a,d,c,b,a |
+---------------+------------------------+
Expand Down
3 changes: 2 additions & 1 deletion docs/table-design/data-partitioning/dynamic-partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,12 +86,13 @@ When using the ALTER TABLE statement to modify dynamic partitioning, the changes
In the example below, the ALTER TABLE statement is used to modify a non-dynamic partitioned table to a dynamic partitioned table:

```sql
CREATE TABLE test_dynamic_partition(
CREATE TABLE test_partition(
order_id BIGINT,
create_dt DATE,
username VARCHAR(20)
)
DUPLICATE KEY(order_id)
PARTITION BY RANGE(create_dt) ()
DISTRIBUTED BY HASH(order_id) BUCKETS 10;

ALTER TABLE test_partition SET (
Expand Down
4 changes: 2 additions & 2 deletions docs/table-design/index/inverted-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -264,7 +264,7 @@ ALTER TABLE table_name DROP INDEX idx_name;
SHOW CREATE TABLE table_name;

-- Syntax 2: IndexType as INVERTED indicates an inverted index
SHOW INDEX FROM idx_name;
SHOW INDEX FROM table_name;

## Using Indexes

Expand Down Expand Up @@ -398,7 +398,7 @@ PROPERTIES ("replication_num" = "1");
```
wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz

curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz -XPUT http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
{
"TxnId": 2,
"Label": "a8a3e802-2329-49e8-912b-04c800a461a6",
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/index/ngram-bloomfilter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ SHOW CREATE TABLE table_name;

-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index
```sql
SHOW INDEX FROM idx_name;
SHOW INDEX FROM table_name;
```

### Deleting an NGram BloomFilter Index
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/schema-change.md
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;

Note: Whether modifying a key column or a value column, the complete column information must be declared.

3. Modify the maximum length of the `val1` column in the base table. The original `val1` was (val1 VARCHAR(32) REPLACE DEFAULT "abc")
3. Modify the maximum length of the `val5` column in the base table. The original `val5` was (val5 VARCHAR(32) REPLACE DEFAULT "abc")

```sql
ALTER TABLE example_db.my_table
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/tiered-storage/remote-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ To optimize query performance and save object storage resources, local Cache has

- The Cache is managed through LRU and does not support TTL.

For specific configurations, please refer to (../../lakehouse/filecache).
For specific configurations, please refer to [Data Cache](../../lakehouse/filecache).

## FAQ

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -129,9 +129,9 @@ Doris 支持以下密码策略,可以帮助用户更好的进行密码管理
- 设置用户属性:[SET PROPERTY](../../sql-manual/sql-statements/account-management/SET-PROPERTY)
- 查看用户属性:[SHOW PROPERTY](../../sql-manual/sql-statements/account-management/SHOW-PROPERTY)
- 修改密码:[SET PASSWORD](../../sql-manual/sql-statements/account-management/SET-PASSWORD)
- 查看支持的所有权限项:[SHOW PRIVILEGES]
- 查看行权限策略 [SHOW ROW POLICY]
- 创建行权限策略 [CREATE ROW POLICY]
- 查看支持的所有权限项:[SHOW PRIVILEGES](../../../../sql-manual/sql-statements/account-management/SHOW-PRIVILEGES)
- 查看行权限策略[SHOW ROW POLICY](../../../../sql-manual/sql-statements/data-governance/SHOW-ROW-POLICY)
- 创建行权限策略[CREATE ROW POLICY](../../../../sql-manual/sql-statements/data-governance/CREATE-ROW-POLICY)

### 权限类型

Expand Down
Loading