Release Notes 3.0.4

## Behavior Changes

- In the Audit log, the `force` flag is retained for `drop table` and `drop database` statements. (#43227)
- When exporting data to Parquet/ORC formats, the `bitmap`, `quantile_state`, and `hll` types are exported in Binary format. Additionally, support has been added for exporting `jsonb` and `variant` types, which are exported as `string`. (#44041)  
  - Related documentation: [[Export Overview - Apache Doris](https://doris.apache.org/docs/3.0/data-operate/export/export-overview)](https://doris.apache.org/docs/3.0/data-operate/export/export-overview)
- The Hudi JNI Scanner has been replaced from Spark API to Hadoop API to enhance compatibility. Users can switch by setting the session variable `set hudi_jni_scanner=spark/hadoop`. (#44267)
- The use of `auto bucket` in Colocate tables is prohibited. (#44396)
- Paimon cache has been added to the Catalog, eliminating real-time data queries. (#44911)
- The default value of `max_broker_concurrency` has been increased to improve performance for large-scale data imports with Broker Load. (#44929)
- The default value of the `storage medium` for Auto Partition partitions has been changed to the attribute value of the current table's `storage medium`, rather than using the system default value. (#45955)
- Column updates are prohibited during Schema Change execution for Key columns. (#46347)
- For Key columns containing auto-increment columns, support has been added to allow column updates without providing the auto-increment column. (#44528)
- The FE ID generator strategy has been switched to a time-based approach, and IDs no longer start from 10000. (#44790)
- In the compute-storage separation mode, the default stale rowset recycling delay for Compaction has been reduced to 1800 seconds to decrease the recycling interval. This may cause large queries to fail in extreme scenarios, and adjustments can be made as needed. (#45460)
- The `show cache hotspot` statement has been disabled in compute-storage separation mode, and direct access to system tables is required. (#47332)
- Deleting the system-created `admin` user is prohibited. (#44751)

## Improvements

### Storage

- Optimized the issue of Routine Load tasks frequently timing out due to a small `max_match_interval` setting. (#46292)
- Improved performance for Broker Load when importing multiple compressed files. (#43975)
- Increased the default value of `webserver_num_workers` to enhance Stream Load performance. (#46593)
- Optimized the load imbalance issue for Routine Load import tasks during BE node scaling. (#44798)
- Improved the use of Routine Load thread pools to prevent timeouts from affecting queries. (#45039)

### Compute-Storage Separation

- Enhanced the stability and observability of the Meta-service. (#44036, #45617, #45255, #45068)
- Optimized File Cache by adding an early eviction strategy, reducing lock time, and improving query performance. (#47473, #45678, #47472)
- Improved initialization checks and queue transitions for File Cache to enhance stability. (#44004, #44429, #45057, #47229)
- Increased the speed of HDFS data recycling. (#46393)
- Optimized performance issues when the FE acquires compute groups during ultra-high-frequency imports. (#47203)
- Improved several import-related parameters for primary key tables in compute-storage separation to enhance the stability of real-time high-concurrency imports. (#47295, #46750, #46365)

### Lakehouse

- Supported reading Hive tables in JSON format. (#43469)  
  - Related documentation: [[Text/CSV/JSON - Apache Doris](https://doris.apache.org/docs/dev/lakehouse/file-formats/text#json)](https://doris.apache.org/docs/dev/lakehouse/file-formats/text#json)
- Introduced the session variable `enable_text_validate_utf8` to skip UTF-8 encoding checks for CSV formats. (#45537)  
  - Related documentation: [[Text/CSV/JSON - Apache Doris](https://doris.apache.org/docs/dev/lakehouse/file-formats/text#character-set)](https://doris.apache.org/docs/dev/lakehouse/file-formats/text#character-set)
- Updated the Hudi version to 0.15 and optimized query planning performance for Hudi tables.
- Improved read performance for MaxCompute partitioned tables. (#45148)
- Optimized performance for Parquet file delayed materialization under high filter rates. (#46183)
- Supported delayed materialization for complex Parquet types. (#44098)
- Optimized predicate pushdown logic for ORC types, supporting more predicate conditions for index filtering. (#43255)

### Asynchronous Materialized Views

- Supported more scenarios for aggregate roll-up rewriting. (#44412)

### Query Optimizer

- Improved partition pruning performance. (#46261)
- Added rules to eliminate `group by` keys based on data characteristics. (#43391)
- Adaptively adjusted the wait time for Runtime Filters based on the target table size. (#42640)
- Improved the ability to push down aggregations in joins to fit more scenarios. (#43856, #43380)
- Improved Limit pushdown for aggregations to fit more scenarios. (#44042)

### Others

- Optimized startup scripts for FE, BE, and MS processes to provide clearer output. (#45610, #45490, #45883)
- The case sensitivity of table names in `show tables` now matches MySQL behavior. (#46030)
- `show index` now supports arbitrary target table types. (#45861)
- `information_schema.columns` now supports displaying default values. (#44849)
- `information_schema.views` now supports displaying view definitions. (#45857)
- Supported the MySQL protocol `COM_RESET_CONNECTION` command. (#44747)

## Bug Fixes

### Storage

- Fixed potential memory errors during the import process for aggregate table models. (#46997)
- Resolved the issue of Routine Load offset loss during FE master node restarts in compute-storage separation mode. (#46566)
- Fixed memory leaks in FE Observer nodes during batch import scenarios in compute-storage mode. (#47244)
- Resolved the issue of Cumulative Point rollback during Full Compaction with Order Data Compaction. (#44359)
- Fixed the issue where Delete operations could temporarily prevent Tablet Compaction scheduling. (#43466)
- Resolved incorrect Tablet states after Schema Change in multi-compute-cluster scenarios. (#45821)
- Fixed the potential NPE error when performing Column Rename Schema Change on primary key tables with `sequence_type`. (#46906)
- **Data Correctness**: Fixed correctness issues for primary key tables when importing partial column updates containing DELETE SIGN columns. (#46194)
- Resolved potential memory leaks in FE when Publish tasks for primary key tables were continuously stuck. (#44846)

### Compute-Storage Separation

- Fixed the issue where File Cache size could exceed the table data size. (#46561, #46390)
- Resolved upload failures at the 5MB boundary during data uploads. (#47333)
- Enhanced robustness by adding more parameter checks for several `alter` operations in Storage Vault. (#45155, #45156, #46625, #47078, #45685, #46779)
- Resolved issues with data recycling failures or slow recycling due to improper Storage Vault configurations. (#46798, #47536, #47475, #47324, #45072)
- Fixed the issue where data recycling could stall, preventing timely recycling. (#45760)
- Resolved incorrect retries for MTTM-230 errors in compute-storage separation mode. (#47370, #47326)
- Fixed the issue where Group Commit WAL was not fully replayed during BE decommissioning in compute-storage separation mode. (#47187)

- Resolved the issue where Tablet Meta exceeding 2GB rendered MS unavailable. (#44780)
- **Data Correctness**: Fixed two duplicate Key issues in primary key tables in compute-storage separation mode. (#46039, #44975)
- Resolved the issue where Base Compaction could continuously fail due to large Delete Bitmaps in primary key tables during high-frequency real-time imports. (#46969)
- Modified incorrect retry logic for Schema Change in primary key tables in compute-storage separation mode to enhance robustness. (#46748)

### Lakehouse

#### Hive

- Fixed the issue where Hive views created by Spark could not be queried. (#43553)
  - Resolved the issue where certain Hive Transaction tables could not be read correctly. (#45753)
  - Fixed the issue where partition pruning failed for Hive tables with special characters in partitions. (#42906)

#### Iceberg

- Fixed the issue where Iceberg tables could not be created in Kerberos authentication environments. (#43445)
- Resolved the issue where `count(*)` queries were inaccurate for Iceberg tables with dangling deletes. (#44039)
- Fixed the issue where query errors occurred due to mismatched column names in Iceberg tables. (#44470)
- Resolved the issue where Iceberg tables could not be read after partition modifications. (#45367)

#### Paimon

- Fixed the issue where Paimon Catalog could not access Alibaba Cloud OSS-HDFS. (#42585)

#### Hudi

- Fixed the issue where partition pruning failed for Hudi tables in certain scenarios. (#44669)

#### JDBC

- Fixed the issue where tables could not be retrieved using JDBC Catalog after enabling case-insensitive table names.

#### MaxCompute

- Fixed the issue where partition pruning failed for MaxCompute tables in certain scenarios. (#44508)

#### Others

- Fixed the issue where Export tasks caused memory leaks in FE. (#44019)
- Resolved the issue where S3 object storage could not be accessed via HTTPS protocol. (#44242)
- Fixed the issue where Kerberos authentication tickets could not be automatically refreshed. (#44916)
- Resolved the issue where reading Hadoop Block compressed format files failed. (#45289)
- When querying ORC format data, CHAR type predicates are no longer pushed down to avoid potential result errors. (#45484)

### Asynchronous Materialized Views

- Fixed the issue where transparent query rewriting could lead to planning or result errors in extreme scenarios. (#44575, #45744)
- Resolved the issue where multiple build tasks could be generated during asynchronous materialized view scheduling in extreme scenarios. (#46020, #46280)

### Query Optimizer

- Fixed the issue where some expression rewrites could produce incorrect expressions. (#44770, #44920, #45922, #45596)
- Resolved occasional incorrect results from SQL Cache. (#44782, #44631, #46443, #47266)
- Fixed the issue where Limit pushdown for aggregation operators could produce incorrect results in some scenarios. (#45369)
- Resolved the issue where delayed materialization optimization could produce incorrect execution plans in some scenarios. (#45693, #46551)

### Query Execution

- Fixed the issue where regular expressions and `like` functions produced incorrect results with special characters. (#44547)
- Resolved the issue where SQL Cache results could be incorrect when switching databases. (#44782)
- Fixed a series of Arrow Flight-related issues. (#45023, #43929)
- Resolved the issue where results were incorrect when the Hash table for HashJoin exceeded 4GB in some cases. (#46461)
- Fixed the overflow issue of the `convert_to` function with Chinese characters. (#46405)
- Resolved the issue where results could be incorrect in extreme scenarios when `group by` was used with Limit. (#47844)
- Fixed the issue where results could be incorrect when accessing certain system tables. (#47498)
- Resolved the issue where the `percentile` function could cause system crashes. (#47068)
- Fixed the performance degradation issue for single-table queries with Limit. (#46090)
- Resolved the issue where `StDistanceSphere` and `StAngleSphere` functions caused system crashes. (#45508)
- Fixed the issue where `map_agg` results were incorrect. (#40454)

### Semi-structured Data Management

#### BloomFilter Index

- Fixed the exception caused by large parameters in BloomFilter Index. (#45780)
- Resolved the issue of high memory usage during BloomFilter Index writes. (#45833)
- Fixed the issue where BloomFilter Index was not correctly deleted when columns were dropped. (#44361, #43378)

#### Inverted Index

- Fixed the occasional crash during inverted index construction. (#43246)
- Resolved the issue where words with zero occurrences occupied space during inverted index merging. (#43113)
- Prevented abnormal large values in Index Size statistics. (#46549)
- Fixed the issue with inverted indexes for VARIANT type fields. (#43375)
- Optimized local cache locality for inverted indexes to improve cache hit rates. (#46518)
- Added the metric `NumInvertedIndexRemoteIOTotal` to query profiles for remote storage reads of inverted indexes. (#45675, #44863)

#### Others

- Fixed the crash issue of the `ipv6_cidr_to_range` function with special NULL data. (#44700)

### Permissions

- When granting `CREATE_PRIV`, the existence of the corresponding resource is no longer checked. (#45125)
- Fixed the issue where queries on views with permissions could fail due to missing permissions for referenced tables in extreme scenarios. (#44621)
- Resolved the issue where permission checks for `use db` did not distinguish between internal and external Catalogs. (#45720)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release Notes 3.0.4 #48013

Behavior Changes

Improvements

Storage

Compute-Storage Separation

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Others

Bug Fixes

Storage

Compute-Storage Separation

Lakehouse

Hive

Iceberg

Paimon

Hudi

JDBC

MaxCompute

Others

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

BloomFilter Index

Inverted Index

Others

Permissions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Release Notes 3.0.4 #48013

Description

Behavior Changes

Improvements

Storage

Compute-Storage Separation

Lakehouse

Asynchronous Materialized Views

Query Optimizer

Others

Bug Fixes

Storage

Compute-Storage Separation

Lakehouse

Hive

Iceberg

Paimon

Hudi

JDBC

MaxCompute

Others

Asynchronous Materialized Views

Query Optimizer

Query Execution

Semi-structured Data Management

BloomFilter Index

Inverted Index

Others

Permissions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions