-
Notifications
You must be signed in to change notification settings - Fork 9
Merge tikv 2025 10 30 #447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: raftstore-proxy
Are you sure you want to change the base?
Merge tikv 2025 10 30 #447
Conversation
close tikv#18441 When the secondary commit failed with error `CommitTsExpired`, collect the MVCC info for debugging. Signed-off-by: Chao Wang <[email protected]>
…ikv#18448) close tikv#18441 In the previous PR, we only collect mvcc info when `commit_role` is `Secondary` when `commit_ts < min_commit_ts`. However, when resolving a lock, the `commit_role` of `commit` is None and we cannot get any mvcc info when this error happens. This PR does an enhancement and checks whether the resolved key is primary in the lock, if not, it still collect mvcc for further debugging. Signed-off-by: Chao Wang <[email protected]> Signed-off-by: 王超 <[email protected]> Co-authored-by: cfzjywxk <[email protected]>
ref tikv#17290 Signed-off-by: “EricZequan” <[email protected]>
ref tikv#17465 Following tikv#17605, another attempt to update rust toolchain Changes: - Language - After rust-lang/rust#134258, we can't manually impl both `ToString` and `fmt::Display`, so this PR add a new trait `ToStringValue` to work around types type produces different result between ToString and Display. - Clippy - `Option::map_or(false, ...)` --> `Option::is_some_and(...)` - `Option::map_or(true, ...)` --> `Option::is_none_or(...)` - `(a + b - 1 )/ b` --> `a.div_ceil(b)` - `io::Error::new(ErrorKind::Other, ...)` --> ` io::Error::other(...)` - `Slice::group_by` --> `Slice::chunk_by` - `Result::map_err(|e| {...; e})` --> `Result::inspect_err(|e| { ... })` - `Map::get(&key).is_{some, none}()` --> `Map::contains_key()` - Formatter - The import order now follows ascii order, e.g. before is "use crate::{a, b, c, A, B, C}", after is "use crate::{A, B, C, a, b, c}". Most changes are due to this. - List in rust doc should be properly aligned. - cargo-deny - `vulnerability`, `notice` and `unsound` can't be config in version 2, and `unmaintained` can't be allowed anymore(but support setting `workspace` to allow indirected pkgs). So replacing some unmaintained packages with suggested alternatives. (See: tikv#18416) Signed-off-by: glorv <[email protected]>
close tikv#18450 Signed-off-by: hongyunyan <[email protected]>
…v#18454) ref tikv#18434 Close some background schedulers before shutting down. We need to restart both the KV engine and the Raft engine in a test to set some non-online configs. e.g. turning off Titan. These background workers have references to either the KV engine or the Raft engine, and they are also self-referencing, causing the KV engine and Raft engine never get closed on shutting down. We need this to interrupt the infinite loop, and release the DBs' references. The best way to resolve this would be using tokio's unbound channel which allows downgrade to weak ptr. However, TiKV just has too many Arc circular dependencies, it is nearly impossible to detangle it. Signed-off-by: Yang Zhang <[email protected]>
close tikv#18469 Signed-off-by: Chao Wang <[email protected]>
…18439) close tikv#18463 Enhances the detection mechanism to cover the I/O jitters on kvdb disk if deploys with separated mount paths. Signed-off-by: lucasliang <[email protected]>
close tikv#18474 bump the version of pprof-rs to 0.15 Signed-off-by: Yang Keao <[email protected]>
close tikv#18465 downgrade rust toolchain to fix arm64 build(It's a workaround of rust bug rust-lang/rust#141306) Signed-off-by: glorv <[email protected]>
ref tikv#15990 Fix the issue where the `yatp_task_wait_duration` metric has no data point because it was not registered to prometheus. Signed-off-by: Bisheng Huang <[email protected]>
…ikv#18484) close tikv#18490 cdc register for a not found region: this indicate that region leader might be transferred to other nodes. cdc failed to schedule barrier for delta before delta scan: this always happens if the channel is disconnected cdc send scan event failed: this always happens if the channel is disconnected or full. All these errors are temporary, so set their log level to WARN. Signed-off-by: 3AceShowHand <[email protected]>
close tikv#18434 The real bug has been fixed by distinguishing Titan blob index from RocksDB blob index during RocksDB upgrade effort. This PR just adds a regression test to prove it is fixed. We reproduced the error with the same test code in 7.5 Signed-off-by: Yang Zhang <[email protected]>
close tikv#18497, ref pingcap/tidb#61318 backup_stream: encode ts related field into meta file path. Signed-off-by: 3pointer <[email protected]> Signed-off-by: 3pointer <[email protected]> Co-authored-by: 山岚 <[email protected]>
close tikv#18506 Signed-off-by: bufferflies <[email protected]>
close tikv#18493 Some of non-fatal error level logs during backing up are now warn level. Signed-off-by: Juncen Yu <[email protected]>
ref tikv#18081 Signed-off-by: Neil Shen <[email protected]>
close tikv#18434 Fixing Titan blob indices causing snapshot apply failures after Titan is turned off bug. Signed-off-by: Yang Zhang <[email protected]>
…nfig file. (tikv#18505) close tikv#18503 Make TiKV can inherit the last configurations on `region-size` to avoid change the default size of region unexpectedly. Signed-off-by: lucasliang <[email protected]>
close tikv#18541 Run RPC function switch_mode on the blocking acceptable thread. Signed-off-by: Jianjun Liao <[email protected]>
…slowlog. (tikv#18562) close tikv#18561 Fix incorrect and misleading index logging in StoreMsg of slowlog. Signed-off-by: lucasliang <[email protected]>
close tikv#18533 To mitigate the impact of stalls when awakening too many regions, we break up all regions into small batches. Signed-off-by: lucasliang <[email protected]>
…r. (tikv#18565) close tikv#18532 Optimize the handling of `CompactedEvent` in raftstore by moving it to `split-check` worker. Signed-off-by: lucasliang <[email protected]>
close tikv#18573 fix Issue tikv#18573 : Error occurred while make doc fix it by modify Makefile You can see the solution in the issue Signed-off-by: DogDu <[email protected]> Co-authored-by: DogDu <[email protected]>
…itters. (tikv#18590) close tikv#18549 Removes the logging for "sst ingest is too slow" to avoid latency jitters. Signed-off-by: lucasliang <[email protected]>
close tikv#18506 replace some error log with warning log Signed-off-by: bufferflies <[email protected]>
close tikv#10047 Register Missing TiKV Configs to Prometheus Due to recent iterations and refactoring in TiKV, some important module configurations were not reported via metrics. This PR registers the configuration metrics for these key modules Signed-off-by: exit-code-1 <[email protected]> Signed-off-by: zhy <[email protected]> Co-authored-by: lucasliang <[email protected]>
ref tikv#15990 build: bump tikv pkg version Signed-off-by: ti-chi-bot <[email protected]>
…he last config file (tikv#18626) ref tikv#18503 Avoid the inheritage of unexpected configurations from the last config file. Signed-off-by: lucasliang <[email protected]>
close tikv#18605 Optimizes `fetch_entries_to` in Raft-Engine to reduce contention and improve performance under mixed workloads. Signed-off-by: lucasliang <[email protected]>
tikv#18967) close tikv#18743 Optimize async snapshot and write tail latency with many SSTs Signed-off-by: Connor1996 <[email protected]>
close tikv#18955 Reduce frequency of store size reporting Signed-off-by: Yang Zhang <[email protected]>
close tikv#18846 Signed-off-by: kennytm <[email protected]>
ref tikv#18498 Add more duplicate entry checks in the write path, panic if there is unexpected results. Note the panic operation should be removed in new release or production. Signed-off-by: cfzjywxk <[email protected]> Signed-off-by: cfzjywxk <[email protected]>
…rage (tikv#18845) close tikv#18840 Signed-off-by: kennytm <[email protected]>
…#18940) close tikv#18939 disable the buggy auto priority quota limiter Signed-off-by: glorv <[email protected]>
…usly (tikv#18984) close tikv#18983 Address the bug that the `raftstore` thread is panic on accessing raft logs of asynchronously destroyed peer in `on_raft_log_gc_tick()`. Signed-off-by: lucasliang <[email protected]>
ref tikv#18498 Disable ENABLE_DUP_KEY_DEBUG by default Signed-off-by: ekexium <[email protected]>
…l thresholds. (tikv#18710) close tikv#18708 This PR addresses performance stability issues caused by increasing storage.flow-control.l0-file-threshold and storage.flow-control.soft-pending-compaction-bytes-limit. Previously, raising these values could reduce the effectiveness of RocksDB’s compaction speed-up mechanism, because the RocksDB internal thresholds (level0-slowdown-writes-trigger and soft-pending-compaction-bytes-limit) would be overridden, delaying compaction acceleration. Key improvements: 1. Conditional override of RocksDB thresholds: - level0-slowdown-writes-trigger is overridden by l0-file-threshold only if it is smaller. - soft-pending-compaction-bytes-limit is overridden only if it is smaller than storage.flow-control.soft-pending-compaction-bytes-limit. This ensures that increasing flow-control settings does not weaken compaction acceleration, while user-configured RocksDB thresholds that are larger than the flow-control limits are overriden, allowing compaction speed-up to trigger before write flow control. flow control. 3. Updated write stall check: - ingest_maybe_slowdown_writes now uses level0-stop-writes-trigger instead of level0-slowdown-writes-trigger to determine whether ingest may trigger a write stall. - This keeps the original behavior, since `l0-file-threshold` overrides `level0-stop-writes-trigger`, just like the previous behavior with `level0-slowdown-writes-trigger`. Ideally, flow-control settings would be used directly to determine write stalls, but `ingest_maybe_slowdown_writes` cannot access the flow-control module configuration because this function resides inside the Engine module。 After this change, write control effectively has three stages: 1. Compaction acceleration: triggered when RocksDB thresholds are reached. 2. Flow control: triggered at storage.flow-control.l0-file-threshold and storage.flow-control.soft-pending-compaction-bytes-limit. 4. Stop writes: triggered at storage.flow-control.hard-pending-compaction-bytes-limit. Signed-off-by: hhwyt <[email protected]>
close tikv#18999 fix external storage cache block Signed-off-by: Jianjun Liao <[email protected]>
…ikv#18923) close tikv#18815 Add network/io info collection for TopSQL: 1. Introduce resource-metering.enable-network-io-collection config to control whether enable this new feature. Default is disabled. 2. Collect network_in, network_out, logical_read, logical_write execution info and recorded in TopSQL: i. Since the LocalStorage of TopSQL recorder is TLS, and only be accessed inside the thread with attached tag. Here, we use GLOBAL_TRACKERS to help record network_in data size. ii. For network_out, we can only directly get resp's size for Coprocessor request. Thus we need to collect this data one by one for all requests. Since we only care about requests that potentially generate large response, we bypass some "write requests" whose response only contained "commit_ts" data. iii. Use the processed_size(https://github.com/tikv/tikv/blob/1deb3a135dc41c3ca227e3d5a29712526b492a4c/components/tikv_kv/src/stats.rs#L195) as logical read size iv. Use the scheduled tasks' write_bytes() as logical write size Signed-off-by: yibin87 <[email protected]>
close tikv#17221 When a SIGTERM signal is received, TiKV tells PD it's stopping by StoreHeartbeat. PD then try to move all the leaders from that TiKV instance before it fully shuts down. Signed-off-by: hujiatao0 <[email protected]>
ref tikv#15990 Propagates errors in get_next_region_context up the call stack Signed-off-by: Yang Zhang <[email protected]>
…kUp (tikv#19008) close tikv#19007 Support basic summary and metrics for push down IndexLookUp Signed-off-by: Chao Wang <[email protected]>
close tikv#18949 Check the memory locks in `ExtraSnapStoreAccessor:: get_local_region_storage` to make sure the `IndexLookUp` can get the consistency rows for 1pc or async commit. Signed-off-by: Chao Wang <[email protected]>
…nder is delayed. (tikv#19015) close tikv#19004 Address the corner case that the `raftstore` thread is panic on handling `ReadyToDestroyPeer `. Signed-off-by: lucasliang <[email protected]>
tikv#19025) close tikv#18498 Signed-off-by: Tharanga Gamaethige <[email protected]> Co-authored-by: Tharanga Gamaethige <[email protected]>
…troying. (tikv#19030) ref tikv#19004, close tikv#19034 Fix the bug introduced by the previous work PR#19015, which makes the under destroying peer could not handle `ApplyRes::(...)` as expected. Signed-off-by: lucasliang <[email protected]>
close tikv#19048 fix potential panic which may happen when subscribe the region and meet rollback and prewrite entry Signed-off-by: 3AceShowHand <[email protected]>
close tikv#19006 Update Auzre SDK to 0.18, the highest version compatible with tikv rust version. Adapt the new interfaces and managed identity for Azure managed identity. Signed-off-by: RidRisR <[email protected]>
close tikv#18800 Signed-off-by: squalfof <[email protected]>
…ords also (tikv#19029) close tikv#18814 When "enable_network_io_collection" is set, 1. Picks top n records for network and top n records for logical io. One record will be picked at most once. 2. Add new aggregator for region_id, pick top n records for cpu, network, logical io, and report final results. Signed-off-by: yibin87 <[email protected]>
close tikv#18604 In test we noticed if download sst failed half way due to some reason, the files are not deleted and thus occupying spaces. We should clean them up. Fix broken br metrics and add one more for download failures Signed-off-by: Wenqi Mou <[email protected]>
close tikv#18843, close tikv#18950 1. Remove read_buf_exact_size for s3 hyper client 2. Use cloud::blob::read_to_end to read migrations from futures::io::AsyncRead 3. Use bytes::Bytes to speed up deallocating MetaFile Signed-off-by: Jianjun Liao <[email protected]> Signed-off-by: Jianjun Liao <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
|
Signed-off-by: Calvin Neo <[email protected]>
9fccb1e to
5aa0a4d
Compare
|
@CalvinNeo: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What is changed and how it works?
Issue Number: Close #xxx
What's Changed:
Related changes
pingcap/docs/pingcap/docs-cn:Check List
Tests
Side effects
Release note