Skip to content

typo(docs): fix some docs problem #2227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion docs/db-connect/arrow-flight-sql-connect.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ In Doris, query results are organized in columnar format as Blocks. In versions

To install Apache Arrow, you can find detailed installation instructions in the official documentation [Apache Arrow](https://arrow.apache.org/install/). For more information on how Doris implements the Arrow Flight protocol, you can refer to [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514).


## Python Usage

Use Python's ADBC ​​Driver to connect to Doris to achieve extremely fast data reading. The following steps use Python (version >= 3.9) ADBC ​​Driver to perform a series of common database syntax operations, including DDL, DML, setting Session variables, and Show statements.
Expand All @@ -33,6 +32,7 @@ Import the following modules/libraries in the code to use the installed Library:

```Python
import adbc_driver_manager
import adbc_driver_flightsql
import adbc_driver_flightsql.dbapi as flight_sql

>>> print(adbc_driver_manager.__version__)
Expand Down Expand Up @@ -264,6 +264,7 @@ cursor.close()
The open source JDBC driver of Arrow Flight SQL protocol is compatible with the standard JDBC API, which can be used by most BI tools to access Doris through JDBC and supports high-speed transmission of Apache Arrow data. The usage is similar to connecting to Doris through the JDBC driver of MySQL protocol. You only need to replace the jdbc:mysql protocol in the link URL with the jdbc:arrow-flight-sql protocol. The query results are still returned in the JDBC ResultSet data structure.

POM dependency:

```Java
<properties>
<arrow.version>17.0.0</arrow.version>
Expand Down Expand Up @@ -321,6 +322,7 @@ conn.close();
In addition to using JDBC, similar to Python, JAVA can also create a Driver to read Doris and return data in Arrow format. The following are how to use AdbcDriver and JdbcDriver to connect to Doris Arrow Flight Server.

POM dependency:

```Java
<properties>
<adbc.version>0.15.0</adbc.version>
Expand Down
10 changes: 5 additions & 5 deletions docs/table-design/data-model/aggregate.md
Original file line number Diff line number Diff line change
Expand Up @@ -160,7 +160,7 @@ select group_concat_merge(v2) from aggstate;
If you do not want the final aggregation result, you can use `union` to combine multiple intermediate aggregation results and generate a new intermediate result.

```sql
insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate;
insert into aggstate select 3,sum(v1),group_concat_union(v2) from aggstate;
```

The calculations in the table are as follows:
Expand All @@ -170,16 +170,16 @@ The calculations in the table are as follows:
The query result is as follows:

```sql
mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate;
mysql> select sum(v1), group_concat_merge(v2) from aggstate;
+---------------+------------------------+
| sum_merge(k2) | group_concat_merge(k3) |
| sum(v1) | group_concat_merge(v2) |
+---------------+------------------------+
| 20 | c,b,a,d,c,b,a,d |
+---------------+------------------------+

mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2;
mysql> select sum(v1), group_concat_merge(v2) from aggstate where k1 != 2;
+---------------+------------------------+
| sum_merge(k2) | group_concat_merge(k3) |
| sum(v1) | group_concat_merge(v2) |
+---------------+------------------------+
| 16 | c,b,a,d,c,b,a |
+---------------+------------------------+
Expand Down
3 changes: 2 additions & 1 deletion docs/table-design/data-partitioning/dynamic-partitioning.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,13 @@ When using the ALTER TABLE statement to modify dynamic partitioning, the changes
In the example below, the ALTER TABLE statement is used to modify a non-dynamic partitioned table to a dynamic partitioned table:

```sql
CREATE TABLE test_dynamic_partition(
CREATE TABLE test_partition(
order_id BIGINT,
create_dt DATE,
username VARCHAR(20)
)
DUPLICATE KEY(order_id)
PARTITION BY RANGE(create_dt) ()
DISTRIBUTED BY HASH(order_id) BUCKETS 10;

ALTER TABLE test_partition SET (
Expand Down
4 changes: 2 additions & 2 deletions docs/table-design/index/inverted-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -244,7 +244,7 @@ ALTER TABLE table_name DROP INDEX idx_name;
SHOW CREATE TABLE table_name;

-- Syntax 2: IndexType as INVERTED indicates an inverted index
SHOW INDEX FROM idx_name;
SHOW INDEX FROM table_name;

## Using Indexes

Expand Down Expand Up @@ -395,7 +395,7 @@ PROPERTIES ("replication_num" = "1");
```
wget https://qa-build.oss-cn-beijing.aliyuncs.com/regression/index/hacknernews_1m.csv.gz

curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
curl --location-trusted -u root: -H "compress_type:gz" -T hacknernews_1m.csv.gz -XPUT http://127.0.0.1:8030/api/test_inverted_index/hackernews_1m/_stream_load
{
"TxnId": 2,
"Label": "a8a3e802-2329-49e8-912b-04c800a461a6",
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/index/ngram-bloomfilter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ SHOW CREATE TABLE table_name;

-- Syntax 2: IndexType as NGRAM_BF indicates an inverted index
```sql
SHOW INDEX FROM idx_name;
SHOW INDEX FROM table_name;
```

### Deleting an NGram BloomFilter Index
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/schema-change.md
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ MODIFY COLUMN col1 BIGINT KEY DEFAULT "1" AFTER col2;

Note: Whether modifying a key column or a value column, the complete column information must be declared.

3. Modify the maximum length of the `val1` column in the base table. The original `val1` was (val1 VARCHAR(32) REPLACE DEFAULT "abc")
3. Modify the maximum length of the `val5` column in the base table. The original `val5` was (val5 VARCHAR(32) REPLACE DEFAULT "abc")

```sql
ALTER TABLE example_db.my_table
Expand Down
2 changes: 1 addition & 1 deletion docs/table-design/tiered-storage/remote-storage.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ To optimize query performance and save object storage resources, local Cache has

- The Cache is managed through LRU and does not support TTL.

For specific configurations, please refer to (../../lakehouse/filecache).
For specific configurations, please refer to [Data Cache](../../lakehouse/filecache).

## FAQ

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -110,9 +110,9 @@ Doris 支持以下密码策略,可以帮助用户更好的进行密码管理
- 设置用户属性:[SET PROPERTY](../../sql-manual/sql-statements/account-management/SET-PROPERTY)
- 查看用户属性:[SHOW PROPERTY](../../sql-manual/sql-statements/account-management/SHOW-PROPERTY)
- 修改密码:[SET PASSWORD](../../sql-manual/sql-statements/account-management/SET-PASSWORD)
- 查看支持的所有权限项:[SHOW PRIVILEGES]
- 查看行权限策略 [SHOW ROW POLICY]
- 创建行权限策略 [CREATE ROW POLICY]
- 查看支持的所有权限项:[SHOW PRIVILEGES](../../../../sql-manual/sql-statements/account-management/SHOW-PRIVILEGES)
- 查看行权限策略[SHOW ROW POLICY](../../../../sql-manual/sql-statements/data-governance/SHOW-ROW-POLICY)
- 创建行权限策略[CREATE ROW POLICY](../../../../sql-manual/sql-statements/data-governance/CREATE-ROW-POLICY)

### 权限类型

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Orphan Memory Tracker 是默认 Memory Tracker,值为正数或负数都意味

- 如果线程开始时 TLS 中没有绑定 Memory Tracker,那么 Doris Allocator 会默认将内存记录到 Orphan Memory Tracker 中,意味着这部分内存不知所属,有关 Doris Allocator 记录内存的原理参考上文 [内存跟踪原理]。

- Query 或 Load 等任务 Memory Tracker 析构时如果值不等于 0,通常意味着这部分内存没有释放,将把这部分剩余的内存记录到 Orphan Memory Tracker 中,相当于将剩余内存交由 Orphan Memory Tracker 继续跟踪。从而保证 Orphan Memory Tracker 和其他 Memory Tracker 之和等于 Doris Allocator 分配出去的所有内存。
- Query 或 Load 等任务 Memory Tracker 析构时如果值不等于 0,通常意味着这部分内存没有释放,将把这部分剩余的内存记录到 Orphan Memory Tracker 中,相当于将剩余内存交由 Orphan Memory Tracker 继续跟踪。从而保证 Orphan Memory Tracker 和其他 Memory Tracker 之和等于 Doris Allocator 分配出去的所有内存。

理想情况下,期望 Orphan Memory Tracker 的值接近 0。所以我们希望所有线程开始时都 Attach 一个 Orphan 之外的 Memory Tracker,比如 Query 或 Load Memory Tracker。并且所有 Query 或 Load Memory Tracker 析构时都等于 0,这意味着 Query 或 Load 执行过程中使用的内存在析构时都已经被释放。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@

安装 Apache Arrow 你可以去官方文档 [Apache Arrow](https://arrow.apache.org/install/) 找到详细的安装教程。更多关于 Doris 实现 Arrow Flight 协议的原理可以参考 [Doris support Arrow Flight SQL protocol](https://github.com/apache/doris/issues/25514)。


## Python 使用方法

使用 Python 的 ADBC Driver 连接 Doris 实现数据的极速读取,下面的步骤使用 Python(版本 >= 3.9)的 ADBC Driver 执行一系列常见的数据库语法操作,包括 DDL、DML、设置 Session 变量以及 Show 语句等。
Expand All @@ -33,6 +32,7 @@ pip install adbc_driver_flightsql

```Python
import adbc_driver_manager
import adbc_driver_flightsql
import adbc_driver_flightsql.dbapi as flight_sql

>>> print(adbc_driver_manager.__version__)
Expand Down Expand Up @@ -264,6 +264,7 @@ cursor.close()
Arrow Flight SQL 协议的开源 JDBC 驱动兼容标准的 JDBC API,可用于大多数 BI 工具通过 JDBC 访问 Doris,并支持高速传输 Apache Arrow 数据。使用方法与通过 MySQL 协议的 JDBC 驱动连接 Doris 类似,只需将链接 URL 中的 jdbc:mysql 协议换成 jdbc:arrow-flight-sql 协议,查询返回的结果依然是 JDBC 的 ResultSet 数据结构。

POM dependency:

```Java
<properties>
<arrow.version>17.0.0</arrow.version>
Expand Down Expand Up @@ -321,6 +322,7 @@ conn.close();
除了使用 JDBC,与 Python 类似,Java 也可以创建 Driver 读取 Doris 并返回 Arrow 格式的数据,下面分别是使用 AdbcDriver 和 JdbcDriver 连接 Doris Arrow Flight Server。

POM dependency:

```Java
<properties>
<adbc.version>0.15.0</adbc.version>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ queue.mem:
flush.timeout: 10s

# 4. output 部分负责数据输出
# doris output 将数据输出到 Doris,使用的是 Stream Load HTTP 接口。通过 headers 参数指定了 Stream Load 的数据格式为 JSON,通过 codec_format_string 参数用类似 printf 的方式格式化输出到 Doris 的数据。比如下面的例子基于 filebeat 内部的字段 format 出一个 JSON,这些字段可以是 filebeat 内置字段如 agent.hostname,也可以是 processor 比如 dissect 生产的字段如 day,通过 %{[a][b]} 的方式引用,Stream Load 会自动将 JSON 字段写入对应的 Doris 表的字段。
# doris output 将数据输出到 Doris,使用的是 Stream Load HTTP 接口。通过 headers 参数指定了 Stream Load 的数据格式为 JSON,通过 codec_format_string 参数用类似 printf 的方式格式化输出到 Doris 的数据。比如下面的例子基于 filebeat 内部的字段 format 出一个 JSON,这些字段可以是 filebeat 内置字段如 agent.hostname,也可以是 processor 比如 dissect 生产的字段如 day,通过 %{[a][b]} 的方式引用,Stream Load 会自动将 JSON 字段写入对应的 Doris 表的字段。
output.doris:
fenodes: [ "http://fehost1:http_port", "http://fehost2:http_port", "http://fehost3:http_port" ]
user: "your_username"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ password_policy:

> 一个用户的唯一标识,语法为:'user_name'@'host'
> `user_identity` 由两部分组成,user_name 和 host,其中 username 为用户名。host 标识用户端连接所在的主机地址。host 部分可以使用 % 进行模糊匹配。如果不指定 host,默认为 '%',即表示该用户可以从任意 host 连接到 Doris。
> host 部分也可指定为 domain,即使用中括号包围,则 Doris 会认为这个是一个 domain,并尝试解析其 ip 地址。
> host 部分也可指定为 domain,即使用中括号包围,则 Doris 会认为这个是一个 domain,并尝试解析其 ip 地址。

## 可选参数

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ password_policy:

> 一个用户的唯一标识,语法为:'user_name'@'host'
> `user_identity` 由两部分组成,user_name 和 host,其中 username 为用户名。host 标识用户端连接所在的主机地址。host 部分可以使用 % 进行模糊匹配。如果不指定 host,默认为 '%',即表示该用户可以从任意 host 连接到 Doris。
> host 部分也可指定为 domain,即使用中括号包围,则 Doris 会认为这个是一个 domain,并尝试解析其 ip 地址。
> host 部分也可指定为 domain,即使用中括号包围,则 Doris 会认为这个是一个 domain,并尝试解析其 ip 地址。

## 可选参数

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,7 @@ DISTRIBUTED BY HASH(`siteid`) BUCKETS 10;
4. 亿级别以上数据,如果有模糊匹配,使用倒排索引或者是 NGram Bloomfilter
:::

### 2.6 Bitmap 索引
### 6 Bitmap 索引

为了加速数据查询,Doris 支持用户为某些字段添加 Bitmap 索引,适合在基数较低的列上进行等值查询或范围查询的场景。

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ AGGREGATE KEY(k1)
DISTRIBUTED BY HASH(k1) BUCKETS 3;
```

在此示例中,`agg_state` 用于声明数据类型,`sum/group_concat` 为聚合函数签名。agg_state 是一种数据类型,类似于 int、array、string。agg_state 只能与 [state](../../sql-manual/sql-functions/combinators/state)、[merge](../../sql-manual/sql-functions/combinators/merge)[union](../../sql-manual/sql-functions/combinators/union) 函数组合器配合使用。它表示聚合函数的中间结果,例如 `group_concat` 的中间状态,而非最终结果。
在此示例中,`agg_state` 用于声明数据类型,`sum/group_concat` 为聚合函数签名。agg_state 是一种数据类型,类似于 int、array、string。agg_state 只能与 [state](../../sql-manual/sql-functions/combinators/state)、[merge](../../sql-manual/sql-functions/combinators/merge)[union](../../sql-manual/sql-functions/combinators/union) 函数组合器配合使用。它表示聚合函数的中间结果,例如 `group_concat` 的中间状态,而非最终结果。

agg_state 类型需要使用 state 函数来生成,对于当前的这个表,需要使用 `group_concat_state`:

Expand Down Expand Up @@ -145,7 +145,7 @@ select group_concat_merge(v2) from aggstate;
如果不想要最终的聚合结果,而希望保留中间结果,可以使用 `union` 操作:

```sql
insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate;
insert into aggstate select 3,sum(v1),group_concat_union(v2) from aggstate;
```

此时表中计算如下:
Expand All @@ -155,16 +155,16 @@ insert into aggstate select 3,sum_union(k2),group_concat_union(k3) from aggstate
查询结果如下:

```sql
mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate;
mysql> select sum(v1), group_concat_merge(v2) from aggstate;
+---------------+------------------------+
| sum_merge(k2) | group_concat_merge(k3) |
| sum(v1) | group_concat_merge(v2) |
+---------------+------------------------+
| 20 | c,b,a,d,c,b,a,d |
+---------------+------------------------+

mysql> select sum_merge(k2) , group_concat_merge(k3)from aggstate where k1 != 2;
mysql> select sum(v1), group_concat_merge(v2) from aggstate where k1 != 2;
+---------------+------------------------+
| sum_merge(k2) | group_concat_merge(k3) |
| sum(v1) | group_concat_merge(v2) |
+---------------+------------------------+
| 16 | c,b,a,d,c,b,a |
+---------------+------------------------+
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

* **写时合并**(merge-on-write):自 1.2 版本起,Doris 默认使用写时合并模式,数据在写入时立即合并相同 Key 的记录,确保存储的始终是最新数据。写时合并兼顾查询和写入性能,避免多个版本的数据合并,并支持谓词下推到存储层。大多数场景推荐使用此模式;

* **读时合并**(merge-on-read):在 1.2 版本前,Doris 中的主键模型默认使用读时合并模式,数据在写入时并不进行合并,以增量的方式被追加存储,在 Doris 内保留多个版本。查询或 Compaction 时,会对数据进行相同 Key 的版本合并。读时合并适合写多读少的场景,在查询是需要进行多个版本合并,谓词无法下推,可能会影响到查询速度。
* **读时合并**(merge-on-read):在 1.2 版本前,Doris 中的主键模型默认使用读时合并模式,数据在写入时并不进行合并,以增量的方式被追加存储,在 Doris 内保留多个版本。查询或 Compaction 时,会对数据进行相同 Key 的版本合并。读时合并适合写多读少的场景,在查询时需要进行多个版本合并,谓词无法下推,可能会影响到查询速度。

在 Doris 中基于主键模型更新有两种语义:

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -69,12 +69,13 @@ PROPERTIES(
下例中通过 ALTER TABLE 语句,将非动态分区表修改为动态分区:

```sql
CREATE TABLE test_dynamic_partition(
CREATE TABLE test_partition(
order_id BIGINT,
create_dt DATE,
username VARCHAR(20)
)
DUPLICATE KEY(order_id)
PARTITION BY RANGE(create_dt) ()
DISTRIBUTED BY HASH(order_id) BUCKETS 10;

ALTER TABLE test_partition SET (
Expand Down
Loading