You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
+ `ln -s <doris_meta_original> <doris_meta_created>` wrong usage.
```text
Usage: ln [OPTION]... [-T] TARGET LINK_NAME
or: ln [OPTION]... TARGET
or: ln [OPTION]... TARGET... DIRECTORY
or: ln [OPTION]... -t DIRECTORY TARGET...
In the 1st form, create a link to TARGET with the name LINK_NAME.
In the 2nd form, create a link to TARGET in the current directory.
In the 3rd and 4th forms, create links to each TARGET in DIRECTORY.
Create hard links by default, symbolic links with --symbolic.
By default, each destination (name of new link) should not already exist.
When creating hard links, each TARGET must exist. Symbolic links
can hold arbitrary text; if later resolved, a relative link is
interpreted in relation to its parent directory.
```
+ `print(adbc_driver_flightsql.__version__)` throw error.
+ fix `Can not found function 'sum_merge'`
+ fix `Can not found function 'sum_union'`
+ fix ```errCode = 2, detailMessage = Table testdb.test_partition is not a dynamic partition table. Use command `HELP ALTER TABLE` to see how to change a normal table to a dynamic partition table.```
+ docs change `SHOW INDEX FROM idx_name` into `SHOW INDEX FROM table_name`
+ fix `errCode = 2, detailMessage = Can not drop key column when table has value column with REPLACE aggregation method`
Copy file name to clipboardExpand all lines: blog/arrow-flight-sql-in-apache-doris-for-10x-faster-data-transfer.md
+15-17Lines changed: 15 additions & 17 deletions
Original file line number
Diff line number
Diff line change
@@ -29,8 +29,7 @@ specific language governing permissions and limitations
29
29
under the License.
30
30
-->
31
31
32
-
33
-
For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include **Arrow Flight SQL protocol** into [Apache Doris 2.1](https://doris.apache.org), which provides **tens-fold speedups for data transfer**.
32
+
For years, JDBC and ODBC have been commonly adopted norms for database interaction. Now, as we gaze upon the vast expanse of the data realm, the rise of data science and data lake analytics brings bigger and bigger datasets. Correspondingly, we need faster and faster data reading and transmission, so we start to look for better answers than JDBC and ODBC. Thus, we include **Arrow Flight SQL protocol** into [Apache Doris 2.1](https://doris.apache.org), which provides **tens-fold speedups for data transfer**.
34
33
35
34
:::tip Tip
36
35
A [demo](https://www.youtube.com/watch?v=zIqy24gI8DE) of loading data from Apache Doris to Python using Arrow Flight SQL.
@@ -42,25 +41,23 @@ As a column-oriented data warehouse, Apache Doris arranges its query results in
42
41
43
42
Apache Doris 2.1 has a data transmission channel built on [Arrow Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html). ([Apache Arrow](https://arrow.apache.org/) is a software development platform designed for high data movement efficiency across systems and languages, and the Arrow format aims for high-performance, lossless data exchange.) It allows **high-speed, large-scale data reading from Doris via SQL in various mainstream programming languages**. For target clients that also support the Arrow format, the whole process will be free of serialization/deserialization, thus no performance loss. Another upside is, Arrow Flight can make full use of multi-node and multi-core architecture and implement parallel data transfer, which is another enabler of high data throughput.
44
43
45
-
For example, if a Python client reads data from Apache Doris, Doris will first convert the column-oriented Blocks to Arrow RecordBatch. Then in the Python client, Arrow RecordBatch will be converted to Pandas DataFrame. Both conversions are fast because the Doris Blocks, Arrow RecordBatch, and Pandas DataFrame are all column-oriented.
44
+
For example, if a Python client reads data from Apache Doris, Doris will first convert the column-oriented Blocks to Arrow RecordBatch. Then in the Python client, Arrow RecordBatch will be converted to Pandas DataFrame. Both conversions are fast because the Doris Blocks, Arrow RecordBatch, and Pandas DataFrame are all column-oriented.
In addition, Arrow Flight SQL provides a general JDBC driver to facilitate seamless communication between databases that supports the Arrow Flight SQL protocol. This unlocks the the potential of Doris to be connected to a wider ecosystem and to be used in more cases.
48
+
In addition, Arrow Flight SQL provides a general JDBC driver to facilitate seamless communication between databases that supports the Arrow Flight SQL protocol. This unlocks the the potential of Doris to be connected to a wider ecosystem and to be used in more cases.
51
49
52
50
## Performance test
53
51
54
52
The "tens-fold speedups" conclusion is based on our benchmark tests. We tried reading data from Doris using PyMySQL, Pandas, and Arrow Flight SQL, and jotted down the durations, respectively. The test data is the ClickBench dataset.

62
59
63
-
**As shown, Arrow Flight SQL outperforms PyMySQL and Pandas in all data types by a factor ranging from 20 to several hundreds**.
60
+
**As shown, Arrow Flight SQL outperforms PyMySQL and Pandas in all data types by a factor ranging from 20 to several hundreds**.
64
61
65
62

66
63
@@ -70,17 +67,18 @@ With support for Arrow Flight SQL, Apache Doris can leverage the Python ADBC Dri
70
67
71
68
### 01 Install library
72
69
73
-
The relevant library is already published on PyPI. It can be installed simply as follows:
70
+
The relevant library is already published on PyPI. It can be installed simply as follows:
74
71
75
72
```C++
76
73
pip install adbc_driver_manager
77
74
pip install adbc_driver_flightsql
78
75
```
79
76
80
-
Import the following module/library to interact with the installed library:
77
+
Import the following module/library to interact with the installed library:
81
78
82
79
```Python
83
80
import adbc_driver_manager
81
+
import adbc_driver_flightsql
84
82
import adbc_driver_flightsql.dbapi as flight_sql
85
83
86
84
>>>print(adbc_driver_manager.__version__)
@@ -95,9 +93,9 @@ Create a client for interacting with the Doris Arrow Flight SQL service. Prerequ
95
93
96
94
Configure parameters for Doris frontend (FE) and backend (BE):
97
95
98
-
- In `fe/conf/fe.conf`, set `arrow_flight_sql_port` to an available port, such as 9090.
96
+
- In `fe/conf/fe.conf`, set `arrow_flight_sql_port` to an available port, such as 9090.
99
97
100
-
- In `be/conf/be.conf`, set `arrow_flight_sql_port` to an available port, such as 9091.
98
+
- In `be/conf/be.conf`, set `arrow_flight_sql_port` to an available port, such as 9091.
101
99
102
100
`Note: The arrow_flight_sql_port port number configured in fe.conf and be.conf is different`
103
101
@@ -370,7 +368,7 @@ dbapi_adbc_execute_fetch_df()
370
368
dbapi_adbc_execute_partitions()
371
369
```
372
370
373
-
Results are as follows (omitting the repeated outputs). **It only takes 3s** to load a Clickbench dataset containing 1 million rows and 105 columns.
371
+
Results are as follows (omitting the repeated outputs). **It only takes 3s** to load a Clickbench dataset containing 1 million rows and 105 columns.
374
372
375
373
```Python
376
374
##################
@@ -399,9 +397,9 @@ None
399
397
400
398
### 02 JDBC
401
399
402
-
The open-source JDBC driver for the Arrow Flight SQL protocol provides compatibility with the standard JDBCAPI. It allows most BI tools to access Doris via JDBCand supports high-speed transfer of Apache Arrow data.
400
+
The open-source JDBC driver for the Arrow Flight SQL protocol provides compatibility with the standard JDBCAPI. It allows most BI tools to access Doris via JDBCand supports high-speed transfer of Apache Arrow data.
403
401
404
-
Usage of this driver is similar to using that for the MySQL protocol. You just need to replace `jdbc:mysql`in the connection URLwith`jdbc:arrow-flight-sql`. The returned result will be in the JDBC ResultSet data structure.
402
+
Usage of this driver is similar to using that for the MySQL protocol. You just need to replace `jdbc:mysql`in the connection URLwith`jdbc:arrow-flight-sql`. The returned result will be in the JDBC ResultSet data structure.
405
403
406
404
```Java
407
405
import java.sql.Connection;
@@ -458,6 +456,6 @@ For Spark users, apart from connecting to Flight SQL Server using JDBC and JAVA,
458
456
459
457
## Hop on the trend train
460
458
461
-
A number of enterprise users of Doris has tried loading data from Doris to Python, Spark, and Flink using Arrow Flight SQLand enjoyed much faster data reading speed. In the future, we plan to include the support for Arrow Flight SQLin data writing, too. By then, most systems built with mainstream programming languages will be able to read and write data from/to Apache Doris by an ADBC client. That's high-speed data interaction which opens up numerous possibilities. On our to-do list, we also envision leveraging Arrow Flight to implement parallel data reading by multiple backends and facilitate federated queries across Doris and Spark.
459
+
A number of enterprise users of Doris has tried loading data from Doris to Python, Spark, and Flink using Arrow Flight SQLand enjoyed much faster data reading speed. In the future, we plan to include the support for Arrow Flight SQLin data writing, too. By then, most systems built with mainstream programming languages will be able to read and write data from/to Apache Doris by an ADBC client. That's high-speed data interaction which opens up numerous possibilities. On our to-do list, we also envision leveraging Arrow Flight to implement parallel data reading by multiple backends and facilitate federated queries across Doris and Spark.
462
460
463
-
Download [Apache Doris 2.1](https://doris.apache.org/download/) and get a taste of 100 times faster data transfer powered by Arrow Flight SQL. If you need assistance, come find us in the [Apache Doris developer and user community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw).
461
+
Download [Apache Doris 2.1](https://doris.apache.org/download/) and get a taste of 100 times faster data transfer powered by Arrow Flight SQL. If you need assistance, come find us in the [Apache Doris developer and user community](https://join.slack.com/t/apachedoriscommunity/shared_invite/zt-2unfw3a3q-MtjGX4pAd8bCGC1UV0sKcw).
Copy file name to clipboardExpand all lines: docs/db-connect/arrow-flight-sql-connect.md
+3-1Lines changed: 3 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,6 @@ In Doris, query results are organized in columnar format as Blocks. In versions
34
34
35
35
To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).
36
36
37
-
38
37
## Python Usage
39
38
40
39
Use Python's ADBC Driver to connect to Doris to achieve extremely fast data reading. The following steps use Python (version >= 3.9) ADBC Driver to perform a series of common database syntax operations, including DDL, DML, setting Session variables, and Show statements.
@@ -52,6 +51,7 @@ Import the following modules/libraries in the code to use the installed Library:
52
51
53
52
```Python
54
53
import adbc_driver_manager
54
+
import adbc_driver_flightsql
55
55
import adbc_driver_flightsql.dbapi as flight_sql
56
56
57
57
>>>print(adbc_driver_manager.__version__)
@@ -281,6 +281,7 @@ cursor.close()
281
281
The open source JDBC driver of Arrow Flight SQL protocol is compatible with the standard JDBC API, which can be used by most BI tools to access Doris through JDBC and supports high-speed transmission of Apache Arrow data. The usage is similar to connecting to Doris through the JDBC driver of MySQL protocol. You only need to replace the jdbc:mysql protocol in the link URL with the jdbc:arrow-flight-sql protocol. The query results are still returned in the JDBC ResultSet data structure.
282
282
283
283
POM dependency:
284
+
284
285
```Java
285
286
<properties>
286
287
<arrow.version>17.0.0</arrow.version>
@@ -325,6 +326,7 @@ conn.close();
325
326
In addition to using JDBC, similar to Python, JAVA can also create a Driver to read Doris and return data in Arrow format. The following are how to use AdbcDriver and JdbcDriver to connect to Doris Arrow Flight Server.
1.**Deploy FE Master Node**: Deploy the first FE node as the Master node;
33
-
33
+
34
34
2.**Deploy FE Cluster**: Deploy the FE cluster by adding Follower or Observer FE nodes;
35
-
35
+
36
36
3.**Deploy BE Nodes**: Register BE nodes to the FE cluster;
37
-
37
+
38
38
4.**Verify Cluster Correctness**: After deployment, connect to and verify the cluster's correctness.
39
39
40
40
## Step 1: Deploy FE Master Node
@@ -44,13 +44,16 @@ The integrated storage-compute architecture is shown below, and the deployment o
44
44
When deploying FE, it is recommended to store metadata on a different hard drive from the BE node data storage.
45
45
46
46
When extracting the installation package, a doris-meta directory is included by default. It is recommended to create a separate metadata directory and link it to the doris-meta directory. In production, it's highly advised to use a separate directory outside the Doris installation folder, preferably on an SSD. For testing and development environments, you can use the default configuration.
47
-
47
+
48
48
```sql
49
49
## Use a separate disk for FE metadata
50
50
mkdir -p <doris_meta_created>
51
51
52
52
## Create FE metadata directory symlink
53
-
ln -s <doris_meta_original><doris_meta_created>
53
+
rm -rf <doris_meta_original>
54
+
55
+
ln -s <doris_meta_created><doris_meta_original>
56
+
54
57
```
55
58
56
59
2.**Modify FE Configuration File**
@@ -72,16 +75,16 @@ The integrated storage-compute architecture is shown below, and the deployment o
72
75
## modify Java Home
73
76
JAVA_HOME = <your-java-home-path>
74
77
```
75
-
78
+
76
79
Parameter Descriptions: For more details, refer to the [FE Configuration](../../admin-manual/config/fe-config):
| JAVA_OPTS | Specify the `-Xmx` parameter to adjust the Java Heap. It is recommended to set it to above 16G in production environments. |
81
-
|[lower_case_table_names](../../admin-manual/config/fe-config#lower_case_table_names)| Set case sensitivity. It is recommended to adjust it to 1, meaning case-insensitive. |
82
-
|[priority_networks](../../admin-manual/config/fe-config#priority_networks)| Network CIDR is specified based on the network IP address. It can be ignored in an FQDN environment. |
84
+
|[lower_case_table_names](../../admin-manual/config/fe-config#lower_case_table_names)| Set case sensitivity. It is recommended to adjust it to 1, meaning case-insensitive. |
85
+
|[priority_networks](../../admin-manual/config/fe-config#priority_networks)| Network CIDR is specified based on the network IP address. It can be ignored in an FQDN environment. |
83
86
| JAVA_HOME | It is recommended to use a JDK environment independent of the operating system for Doris. |
84
-
87
+
85
88
3.**Start FE Process**
86
89
87
90
You can start the FE process using the following command:
@@ -233,7 +236,6 @@ In production, it is recommended to deploy at least 3 nodes. After deploying the
233
236
234
237
-`TabletNum` represents the number of shards on the node. Newly added nodes will undergo data balancing, and the `TabletNum` will gradually become more evenly distributed.
Copy file name to clipboardExpand all lines: docs/table-design/data-model/aggregate.md
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -179,7 +179,7 @@ select group_concat_merge(v2) from aggstate;
179
179
If you do not want the final aggregation result, you can use `union` to combine multiple intermediate aggregation results and generate a new intermediate result.
180
180
181
181
```sql
182
-
insert into aggstate select3,sum_union(k2),group_concat_union(k3) from aggstate;
182
+
insert into aggstate select3,sum(v1),group_concat_union(v2) from aggstate;
183
183
```
184
184
185
185
The calculations in the table are as follows:
@@ -189,16 +189,16 @@ The calculations in the table are as follows:
0 commit comments