Description
Behavior Changes
- In the Audit log, the
force
flag is retained fordrop table
anddrop database
statements. ([fix](drop sql) add force in the tosql for drop table and drop database #43227) - When exporting data to Parquet/ORC formats, the
bitmap
,quantile_state
, andhll
types are exported in Binary format. Additionally, support has been added for exportingjsonb
andvariant
types, which are exported asstring
. ([fix](Outfile) Fix the data type mapping for complex types in Doris to the ORC and Parquet file formats. #44041)- Related documentation: [Export Overview - Apache Doris](https://doris.apache.org/docs/3.0/data-operate/export/export-overview)
- The Hudi JNI Scanner has been replaced from Spark API to Hadoop API to enhance compatibility. Users can switch by setting the session variable
set hudi_jni_scanner=spark/hadoop
. ([fix](hudi) upgrade hudi to 0.15.0 #44267) - The use of
auto bucket
in Colocate tables is prohibited. ([fix](table) Disable create, alter auto bucket table with colocate #44396) - Paimon cache has been added to the Catalog, eliminating real-time data queries. ([feat](mtmv)Paimon queries the data in the cache instead of querying the latest data #44911)
- The default value of
max_broker_concurrency
has been increased to improve performance for large-scale data imports with Broker Load. ([performance](load) increase max_broker_concurrency to 100 #44929) - The default value of the
storage medium
for Auto Partition partitions has been changed to the attribute value of the current table'sstorage medium
, rather than using the system default value. ([Bug](auto-partition) fix auto partition could set storage_medium properties #45955) - Column updates are prohibited during Schema Change execution for Key columns. ([Fix](partial update) abort partial update on shadow index's tablet when the including columns miss key columns on new schema #46347)
- For Key columns containing auto-increment columns, support has been added to allow column updates without providing the auto-increment column. ([opt](auto-inc) Allow to miss auto-increment column and other value columns in partial update #44528)
- The FE ID generator strategy has been switched to a time-based approach, and IDs no longer start from 10000. ([chore](conf) Set enable_advance_next_id=true by default #44790)
- In the compute-storage separation mode, the default stale rowset recycling delay for Compaction has been reduced to 1800 seconds to decrease the recycling interval. This may cause large queries to fail in extreme scenarios, and adjustments can be made as needed. ([Enhancement](config) Modify cloud default stale rowset recycle time #45460)
- The
show cache hotspot
statement has been disabled in compute-storage separation mode, and direct access to system tables is required. ([chore](file cache) Disable show cache hotspot stmt #47332) - Deleting the system-created
admin
user is prohibited. ([fix](auth)Prohibit deleting admin user #44751)
Improvements
Storage
- Optimized the issue of Routine Load tasks frequently timing out due to a small
max_match_interval
setting. ([improve](routine load) introduce routine load task min timeout #46292) - Improved performance for Broker Load when importing multiple compressed files. (branch-3.0: [performance](load) fix broker load scan ranges for unsplittable files #43161 #43975)
- Increased the default value of
webserver_num_workers
to enhance Stream Load performance. (branch-3.0: [opt](load) Increase the default value of webserver_num_workers #46304 #46593) - Optimized the load imbalance issue for Routine Load import tasks during BE node scaling. (branch-3.0: [improve](routine load) ensure load balance after scaling up BE nodes #44693 #44798)
- Improved the use of Routine Load thread pools to prevent timeouts from affecting queries. (branch-3.0: [fix](routine load) replace heavy work pool with routine load thread pool for metadata fetching #44907 #45039)
Compute-Storage Separation
- Enhanced the stability and observability of the Meta-service. ([improve][ms] Bvars add the FDB get_count_normalized indicator #44036, [opt](recycler) Improve robustness and observability #45617, [Fix](cloud) Cloud enable fe deploy mode from master-observers to multi-followers #45255, [opt](cloud) reset ms response message in case it is reused for retry #45068)
- Optimized File Cache by adding an early eviction strategy, reducing lock time, and improving query performance. ([enhancement](cloud) file cache evict in advance #47473, [enhancement](cloud) add profile counter for file cache #45678, [fix](cloud) shorten cache lock held time and add metrics #47472)
- Improved initialization checks and queue transitions for File Cache to enhance stability. ([fix](cloud) explicit message when parse file_cache_path error #44004, [fix](cloud) serialize cache init to avoid unstable cache pick #44429, [fix](cloud) fix CHECK failed when transmit from non-TTL to normal #45057, [chore](file_cache) Set enbale_dump_error_file to false by default #47229)
- Increased the speed of HDFS data recycling. ([Fix](recycler) Fix recycler Hdfs accessor list all bug #46393)
- Optimized performance issues when the FE acquires compute groups during ultra-high-frequency imports. ([opt](cloud) Fix frequent rlock for SystemInfoService.getClusterXxx() #47203)
- Improved several import-related parameters for primary key tables in compute-storage separation to enhance the stability of real-time high-concurrency imports. ([fix](cloud-mow) Make delete bitmap cache expired time more reasonable #47295, [improve](cloud-mow)Make some mow calculating delete bitmap config more reasonable #46750, [fix](cloud-mow) Make some timeout about mow more reasonable #46365)
Lakehouse
- Supported reading Hive tables in JSON format. ([feature](hive)support hive catalog read json table. #43469)
- Related documentation: [Text/CSV/JSON - Apache Doris](https://doris.apache.org/docs/dev/lakehouse/file-formats/text#json)
- Introduced the session variable
enable_text_validate_utf8
to skip UTF-8 encoding checks for CSV formats. ([enchement](utf8)import enable_text_validate_utf8 session var #45537)- Related documentation: [Text/CSV/JSON - Apache Doris](https://doris.apache.org/docs/dev/lakehouse/file-formats/text#character-set)
- Updated the Hudi version to 0.15 and optimized query planning performance for Hudi tables.
- Improved read performance for MaxCompute partitioned tables. ([enchement](mc)Optimize reading of maxcompute partition tables. #45148)
- Optimized performance for Parquet file delayed materialization under high filter rates. (branch-2.1: [fix](parquet-reader) Fixed the issue of excessive scanning data in late materialization case of parquet reader #46121 #46183)
- Supported delayed materialization for complex Parquet types. ([opt](parquet-reader)Implement late materialization of parquet complex types. #44098)
- Optimized predicate pushdown logic for ORC types, supporting more predicate conditions for index filtering. ([enhance](orc) Optimize ORC Predicate Pushdown for OR-connected Predicate #43255)
Asynchronous Materialized Views
- Supported more scenarios for aggregate roll-up rewriting. ([opt](mtmv) Support any_value aggregate function rollup rewrite when exsits aggregate materialized view #44412)
Query Optimizer
- Improved partition pruning performance. ([opt](nereids) improve prune partition with lots of
in (xxx)
#46261) - Added rules to eliminate
group by
keys based on data characteristics. ([feat](nereids) add rewrite rule :EliminateGroupByKeyByUniform #43391) - Adaptively adjusted the wait time for Runtime Filters based on the target table size. ([feat](nereids)set runtime filter wait time according to table row count and table type #42640)
- Improved the ability to push down aggregations in joins to fit more scenarios. ([opt](nereids) enhance PUSH_DOWN_AGG_THROUGH_JOIN_ONE_SIDE #43856, [opt](nereids) support pushing down aggr distinct through join #43380)
- Improved Limit pushdown for aggregations to fit more scenarios. ([opt](nereids) optimize push limit to agg #44042)
Others
- Optimized startup scripts for FE, BE, and MS processes to provide clearer output. ([chore](script) fix
start_fe.sh --version
not work and MetaService scripts occur error in Debian GNU/Linux 11 (bullseye) #45610, [chore](version) Show binary version in metrics: fe be ms #45490, [chore](bash) optimize output information when doris_cloud startup #45883) - The case sensitivity of table names in
show tables
now matches MySQL behavior. ([fix](show)show tables should be case insensitive when lowerCaseTableNames is 1 or 2. #46030) show index
now supports arbitrary target table types. ([opt](show) let all types table support show index #45861)information_schema.columns
now supports displaying default values. ([improvement](information_schema)Support show default value in information_schema. #44849)information_schema.views
now supports displaying view definitions. ([improvement](information_schema)Show view definition in information_schema.views. #45857)- Supported the MySQL protocol
COM_RESET_CONNECTION
command. ([improvement](mysql)Support mysql COM_RESET_CONNECTION command. #44747)
Bug Fixes
Storage
- Fixed potential memory errors during the import process for aggregate table models. ([bugfix](memtable) arena is freed early and will cause use after free #46997)
- Resolved the issue of Routine Load offset loss during FE master node restarts in compute-storage separation mode. (branch-3.0: [fix](cloud) fix routine load loss data when fe master node restart #46149 #46566)
- Fixed memory leaks in FE Observer nodes during batch import scenarios in compute-storage mode. (branch-3.0: [fix](cloud)(bulk load) fix memory leak in FE observer node #47074 #47244)
- Resolved the issue of Cumulative Point rollback during Full Compaction with Order Data Compaction. ([Fix](full compaction) Full compaction should not do ordered data compaction #44359)
- Fixed the issue where Delete operations could temporarily prevent Tablet Compaction scheduling. ([Enhancement](compaction) Do not set failure time when cumulative compaction dealing with delete rowset #43466)
- Resolved incorrect Tablet states after Schema Change in multi-compute-cluster scenarios. ([enhancement](meta) Sync tablet meta even if local state is not running #45821)
- Fixed the potential NPE error when performing Column Rename Schema Change on primary key tables with
sequence_type
. ([Fix](schema change) Fix NPE when rename column on table which has sequence type column #46906) - Data Correctness: Fixed correctness issues for primary key tables when importing partial column updates containing DELETE SIGN columns. ([Fix](partial update) Fix incorrect result when partial update include delete sign columns #46194)
- Resolved potential memory leaks in FE when Publish tasks for primary key tables were continuously stuck. ([Fix](TPartitionVersionInfo) Fix duplicate
TPartitionVersionInfo
inPublishVersionTask.partitionVersionInfos
#44846)
Compute-Storage Separation
-
Fixed the issue where File Cache size could exceed the table data size. ([fix](cloud) fix file cache potential leakage #46561, [fix](cloud) Support clean tablet file cache when tablet drop #46390)
-
Resolved upload failures at the 5MB boundary during data uploads. ([fix](s3filewriter) Fix s3_write_buffer_size boundary issue #47333)
-
Enhanced robustness by adding more parameter checks for several
alter
operations in Storage Vault. ([fix](storage vault) Fix missing use_path_style when create storage vault #45155, [fix](vault) avoid encrypt twice when altering vault #45156, [fix](vault) fixCreateTableLikeStmt
cannot work in stoarge vault mode #46625, [opt](storage vault) Checks3.root.path
cannot be empty #47078, [fix](vault) Fix bugs about altering storage vault name #45685, [fix](vault) Fix creating storage vault failed with azure backend #46779) -
Resolved issues with data recycling failures or slow recycling due to improper Storage Vault configurations. ([fix](recycler) Fix premature exit recycling when there is an invalid storage vault #46798, [Fix](recycler) Fix retain inverted indexes in tmp rowset recycling #47536, [Fix](recycler) Delete again to double check when recycle tablet failed by some bugs #47475, [Fix](recycler) Fix recycler fail when dealing with rowset [0-1] #47324, [fix](recycler) Process self-defined domain names for s3 storage vault #45072)
-
Fixed the issue where data recycling could stall, preventing timely recycling. ([fix](recycler) Fix CountdownEvent error and hang #45760)
-
Resolved incorrect retries for MTTM-230 errors in compute-storage separation mode. ([fix](cloud) Fix async mtmv job retry when meet -230 in cloud #47370, [fix](cloud) Fix cloud -230 retry not reset ctx state #47326)
-
Fixed the issue where Group Commit WAL was not fully replayed during BE decommissioning in compute-storage separation mode. ([fix](cloud) Fix cloud decomission and check wal #47187)
-
Resolved the issue where Tablet Meta exceeding 2GB rendered MS unavailable. ([fix](meta-service) Avoid rowset meta exceeds 2G result in protobuf fatal #44780)
-
Data Correctness: Fixed two duplicate Key issues in primary key tables in compute-storage separation mode. ([Fix](merge-on-write) Should update pending delete bitmap KVs in MS when no need to calc delete bitmaps in publish phase #46039, [Fix](merge-on-write) Should clear
GetDeleteBitmapUpdateLockResponse
when geting delete bitmap update lock fail and retry #44975) -
Resolved the issue where Base Compaction could continuously fail due to large Delete Bitmaps in primary key tables during high-frequency real-time imports. ([fix](cloud-mow) Fix the issue of inaccurate estimation of txn size when updating delete bitmap #46969)
-
Modified incorrect retry logic for Schema Change in primary key tables in compute-storage separation mode to enhance robustness. ([fix](cloud-mow) schema change should retry when encouter TXN_CONFILCT in cloud mode #46748)
Lakehouse
Hive
- Fixed the issue where Hive views created by Spark could not be queried. (branch-2.1: [fix](hive) support query hive view created by spark #43553)
- Resolved the issue where certain Hive Transaction tables could not be read correctly. ([fix](hive)fix hive insert only translaction table. #45753)
- Fixed the issue where partition pruning failed for Hive tables with special characters in partitions. ([fix](hive)fix hive catalog miss partition that have special characters. #42906)
Iceberg
- Fixed the issue where Iceberg tables could not be created in Kerberos authentication environments. ([feat](catalog)Support Pre-Execution Authentication for HMS Type Iceberg Catalog Operations. #43445)
- Resolved the issue where
count(*)
queries were inaccurate for Iceberg tables with dangling deletes. ([fix](iceberg)Fix count(*) error with dangling delete problem #44039) - Fixed the issue where query errors occurred due to mismatched column names in Iceberg tables. ([fix](iceberg)Bring field_id with parquet files And fix map type's key optional #44470)
- Resolved the issue where Iceberg tables could not be read after partition modifications. ([enchement](iceberg)support read iceberg partition evolution table. #45367)
Paimon
- Fixed the issue where Paimon Catalog could not access Alibaba Cloud OSS-HDFS. ([Fix](PaimonCatalog) fix the problem that paimon catalog can not access to OSS-HDFS #42585)
Hudi
- Fixed the issue where partition pruning failed for Hudi tables in certain scenarios. ([fix](hudi)Add hudi catalog read partition table partition prune #44669)
JDBC
- Fixed the issue where tables could not be retrieved using JDBC Catalog after enabling case-insensitive table names.
MaxCompute
- Fixed the issue where partition pruning failed for MaxCompute tables in certain scenarios. ([enchement](maxcompute)add mc catalog read partition table partition prune #44508)
Others
- Fixed the issue where Export tasks caused memory leaks in FE. ([fix](Export) fix a memory leak in the FE because of the ExportJob #44019)
- Resolved the issue where S3 object storage could not be accessed via HTTPS protocol. ([fix](s3) do not replace https scheme if specified #44242)
- Fixed the issue where Kerberos authentication tickets could not be automatically refreshed. ([feat](catalog)Replace HadoopUGI with HadoopKerberosAuthenticator to Support Kerberos Ticket Auto-Renewal #44916)
- Resolved the issue where reading Hadoop Block compressed format files failed. ([fix](hive) fix block decompressor bug #45289)
- When querying ORC format data, CHAR type predicates are no longer pushed down to avoid potential result errors. ([Fix](ORC) Not push down fixed char type in orc reader #45484)
Asynchronous Materialized Views
- Fixed the issue where transparent query rewriting could lead to planning or result errors in extreme scenarios. ([fix](mtmv) Fix filter position different but same causing rewritten by materialized view fail #44575, [fix](mtmv) Fix mv is deleted in nested mv causing query err and fix some test #45744)
- Resolved the issue where multiple build tasks could be generated during asynchronous materialized view scheduling in extreme scenarios. ([fix](mtmv)The refresh method for MTMV is commit. If the status is PAUSED, no more tasks should be generated #46020, [Improve](mtmv) skip the generation of invalid task for refresh mtmv #46280)
Query Optimizer
- Fixed the issue where some expression rewrites could produce incorrect expressions. ([fix](nereids) fix UnknownValue's reference in simplify range rule #44637 #44770, branch-3.0: [fix](nereids) fix compare with long min for simplify comparison rule #44920, branch-3.0: [fix](nereids) fix compare with date like overflow #45868 #45922, branch-3.0: [fix](nereids) fix compare with date like literal #45382 #45596)
- Resolved occasional incorrect results from SQL Cache. ([fix](cache) fix same sql return wrong result when switch database with
use db
and enable sql cache #44782, [fix](sql_cache) fix sql cache result wrong of from_unixtime(col, 'yyyy-MM-dd HH:mm:ss') #44631, [fix](nereids) fix sql cache bug and some tests #46443, branch-3.0: [fix](cache) fix sql cache throw npe in cloud mode #47221 #47266) - Fixed the issue where Limit pushdown for aggregation operators could produce incorrect results in some scenarios. ([fix](Nereids) set correct sort key for aggregate #45369)
- Resolved the issue where delayed materialization optimization could produce incorrect execution plans in some scenarios. ([fix](nereids) support one phase DeferMaterializeTopN #45693, [fix](nereids) topN filter: use ObjectId as map key instead of Topn node #46551)
Query Execution
- Fixed the issue where regular expressions and
like
functions produced incorrect results with special characters. ([fix](hyperscan) Fix hyper scan fall back to re2 #44547) - Resolved the issue where SQL Cache results could be incorrect when switching databases. ([fix](cache) fix same sql return wrong result when switch database with
use db
and enable sql cache #44782) - Fixed a series of Arrow Flight-related issues. ([fix](arrow-flight-sql) Fix query result is empty and not return query error message #45023, [fix](arrow-flight-sql) Fix Doris NULL column conversion to arrow batch #43929)
- Resolved the issue where results were incorrect when the Hash table for HashJoin exceeded 4GB in some cases. ([Bug](join) fix columnstr64's offset overflow on serialize_value_into_arena #46461)
- Fixed the overflow issue of the
convert_to
function with Chinese characters. ([fix](mem) heap-buffer-overflow for function convert_to #46405) - Resolved the issue where results could be incorrect in extreme scenarios when
group by
was used with Limit. ([Bug](fix) Fix topn agg limit may get error result in when refresh heap #47844) - Fixed the issue where results could be incorrect when accessing certain system tables. ([fix](local shuffle) Set serial execution for schema scan operator #47498)
- Resolved the issue where the
percentile
function could cause system crashes. ([Fix](bug) Percentile* func core when percent args is negative number #47068) - Fixed the performance degradation issue for single-table queries with Limit. (branch-3.0: [fix](scan) Fix scan with limit #46035 #46090)
- Resolved the issue where
StDistanceSphere
andStAngleSphere
functions caused system crashes. ([Fix](function) fix coredump of function StDistanceSphere and StAngleSphere #45508) - Fixed the issue where
map_agg
results were incorrect. ([Bug](map) fix wrong result on map_agg with streaming agg #40454)
Semi-structured Data Management
BloomFilter Index
- Fixed the exception caused by large parameters in BloomFilter Index. ([fix](bf index) add ngram bf index validation in nereids index definition check #45780)
- Resolved the issue of high memory usage during BloomFilter Index writes. ([opt](bloomfilter index) optimize memory usage for bloom filter index writer #45833)
- Fixed the issue where BloomFilter Index was not correctly deleted when columns were dropped. ([fix](bloom filter)Fix drop column with bloom filter index #44361, [fix](bloom filter)Fix drop column with bloom filter (#41369) #43378)
Inverted Index
- Fixed the occasional crash during inverted index construction. ([fix] (build index) fix build index coredump #43246)
- Resolved the issue where words with zero occurrences occupied space during inverted index merging. ([fix](index compaction)Skip writing terms with a doc frequency of 0 #43113)
- Prevented abnormal large values in Index Size statistics. ([fix](index size) discard index size when meta size is invalid #46549)
- Fixed the issue with inverted indexes for VARIANT type fields. ([fix](variant) fix index in variant #43375)
- Optimized local cache locality for inverted indexes to improve cache hit rates. ([fix](inverted index) inverted Index File Cache Queue Optimization #46518)
- Added the metric
NumInvertedIndexRemoteIOTotal
to query profiles for remote storage reads of inverted indexes. ([opt](profile) add index page profile for io #45675, [opt](inverted index) Add NumInvertedIndexRemoteIOTotal statistics in profile #44863)
Others
- Fixed the crash issue of the
ipv6_cidr_to_range
function with special NULL data. ([fix](ip) fix ip nullable param without check #44700)
Permissions
- When granting
CREATE_PRIV
, the existence of the corresponding resource is no longer checked. ([enhance](auth)When authorization includes create, not check if resources exist #45125) - Fixed the issue where queries on views with permissions could fail due to missing permissions for referenced tables in extreme scenarios. ([fix](auth)Fix the need for low-level table permissions when querying views in certain situations #44621)
- Resolved the issue where permission checks for
use db
did not distinguish between internal and external Catalogs. ([fix](auth) fix use database stmt access unauthorized catalog #45720)
Activity