Skip to content

Commit 793f1cd

Browse files
skygragonwl177541claude
authored
[Store] Add P2P local metrics (#1945)
* [Store] Add P2P local storage metrics for DataManager Add LocalStorageMetric to track P2P client local storage operations: LocalStorageMetric: - put_requests_total, put_failures_total, put_bytes_total - put_latency_us (histogram) - get_requests_total, get_failures_total, get_misses_total - get_bytes_total, get_latency_us (histogram) Changes: - Add LocalStorageMetric struct in client_metric.h - Add SetMetrics() method to DataManager for metrics injection - Record metrics in DataManager::Put() and DataManager::Get() methods - Wire up metrics in P2PClientService initialization * [Store] Improve local storage metrics accuracy and maintainability - Add put_failures and get_failures to LocalStorageMetric::summary_metrics() - Refactor PutViaMemcpy metrics recording into a helper lambda to avoid duplication - Move Get metrics recording into async task handle for accurate latency measurement - Record get_failures when async transfer fails (previously missed) Co-Authored-By: Claude (antchat/GLM-5) <noreply@anthropic.com> * [Store] Add Stopwatch class and simplify metrics timing code - Add Stopwatch class in client_metric.h for elapsed time measurement - Use Stopwatch to simplify latency recording in PutViaTe, PutViaMemcpy, and Get functions - Method name elapsed_us() follows snake_case convention in codebase Co-Authored-By: Claude (antchat/GLM-5) <noreply@anthropic.com> * [Store] Move P2P local storage metrics from DataManager to P2PClientService - Add P2PClientMetric struct for tracking local Get/Put operations - Remove LocalStorageMetric integration from DataManager - Record metrics in P2PClientService at CreateGetHandle/CreateLocalPutHandle - Add GetSummaryMetrics() and SerializeMetrics() for P2P metrics export Co-Authored-By: Claude (antchat/GLM-5) <noreply@anthropic.com> * [Store] Fix P2P client metrics timing and recording positions - Move latency measurement to Wait() calls for accurate timing - Record requests at entry points (Put/Get/BatchPut/BatchGet) - Record bytes only on successful operations - Record failures at appropriate error points - Add CalculateTotalBytes helper function - Remove redundant metrics recording from CreateGetHandle/CreateLocalPutHandle Co-Authored-By: Claude (antchat/GLM-5) <noreply@anthropic.com> * [Store] Fix P2P client metrics timing to include handle creation Move Stopwatch initialization before CreatePutHandle/CreateGetHandle calls to accurately measure the full operation latency including handle creation time. Also remove redundant CalculateTotalBytes helper in favor of CalculateSliceSize. Co-Authored-By: Claude (antchat/GLM-5) <noreply@anthropic.com> --------- Co-authored-by: wl177541 <wl177541@antgroup.com> Co-authored-by: Claude (antchat/GLM-5) <noreply@anthropic.com>
1 parent 036075c commit 793f1cd

11 files changed

Lines changed: 735 additions & 5 deletions

mooncake-store/include/client_metric.h

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,22 @@ const std::vector<double> kLatencyBucket = {
2222
// safeguards for long tails
2323
50000, 100000, 200000, 500000, 1000000};
2424

25+
// Simple stopwatch for measuring elapsed time in microseconds
26+
class Stopwatch {
27+
public:
28+
Stopwatch() : start_time_(std::chrono::steady_clock::now()) {}
29+
30+
int64_t elapsed_us() const {
31+
auto now = std::chrono::steady_clock::now();
32+
return std::chrono::duration_cast<std::chrono::microseconds>(
33+
now - start_time_)
34+
.count();
35+
}
36+
37+
private:
38+
std::chrono::steady_clock::time_point start_time_;
39+
};
40+
2541
static inline std::string get_env_or_default(
2642
const char* env_var, const std::string& default_val = "") {
2743
const char* val = getenv(env_var);
@@ -276,6 +292,16 @@ struct ClientMetric {
276292
TransferMetric transfer_metric;
277293
MasterClientMetric master_client_metric;
278294

295+
/**
296+
* @brief Check if metrics are enabled via environment variable
297+
* @return true if enabled, false if disabled
298+
*
299+
* Environment variable:
300+
* - MC_STORE_CLIENT_METRIC: Enable/disable metrics (enabled by default,
301+
* set to 0/false to disable)
302+
*/
303+
static bool IsEnabled();
304+
279305
/**
280306
* @brief Creates a ClientMetric instance based on environment variables
281307
* @return std::unique_ptr<ClientMetric> containing the instance if enabled,

mooncake-store/include/client_service.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -279,7 +279,7 @@ class ClientService {
279279
const std::vector<std::string>& keys) = 0;
280280

281281
// For human-readable metrics
282-
tl::expected<std::string, ErrorCode> GetSummaryMetrics() {
282+
virtual tl::expected<std::string, ErrorCode> GetSummaryMetrics() {
283283
if (metrics_ == nullptr) {
284284
return tl::make_unexpected(ErrorCode::INVALID_PARAMS);
285285
}
@@ -297,7 +297,7 @@ class ClientService {
297297
}
298298

299299
// For Prometheus-style metrics
300-
tl::expected<std::string, ErrorCode> SerializeMetrics() {
300+
virtual tl::expected<std::string, ErrorCode> SerializeMetrics() {
301301
if (metrics_ == nullptr) {
302302
return tl::make_unexpected(ErrorCode::INVALID_PARAMS);
303303
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#pragma once
2+
3+
#include "client_metric.h"
4+
5+
namespace mooncake {
6+
7+
// P2P client metrics for local storage operations
8+
struct P2PClientMetric {
9+
// Local Get metrics
10+
ylt::metric::counter_t local_get_requests;
11+
ylt::metric::counter_t local_get_hits;
12+
ylt::metric::counter_t local_get_misses;
13+
ylt::metric::counter_t local_get_failures;
14+
ylt::metric::counter_t local_get_bytes;
15+
ylt::metric::histogram_t local_get_latency;
16+
17+
// Local Put metrics
18+
ylt::metric::counter_t local_put_requests;
19+
ylt::metric::counter_t local_put_failures;
20+
ylt::metric::counter_t local_put_bytes;
21+
ylt::metric::histogram_t local_put_latency;
22+
23+
explicit P2PClientMetric(std::map<std::string, std::string> labels = {});
24+
25+
void serialize(std::string& str);
26+
std::string summary_metrics();
27+
};
28+
29+
} // namespace mooncake

mooncake-store/include/p2p_client_service.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@
1717
#include "client_rpc_service.h"
1818
#include "ha_recovery_manager.h"
1919
#include "peer_client.h"
20+
#include "p2p_client_metric.h"
2021
#include "p2p_master_client.h"
2122
#include "route_cache.h"
2223
#include "task_handle.h"
@@ -173,6 +174,9 @@ class P2PClientService final : public ClientService {
173174

174175
std::string GetHealthStatus() const override;
175176

177+
tl::expected<std::string, ErrorCode> GetSummaryMetrics() override;
178+
tl::expected<std::string, ErrorCode> SerializeMetrics() override;
179+
176180
private:
177181
/**
178182
* @brief init TieredBackend and DataManager
@@ -352,6 +356,9 @@ class P2PClientService final : public ClientService {
352356

353357
// HA recovery manager
354358
std::unique_ptr<HARecoveryManager> ha_manager_;
359+
360+
// P2P local storage metrics
361+
std::unique_ptr<P2PClientMetric> p2p_metrics_;
355362
};
356363

357364
} // namespace mooncake

mooncake-store/src/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ set(MOONCAKE_STORE_SOURCES
1010
client_service.cpp
1111
centralized_client_service.cpp
1212
p2p_client_service.cpp
13+
p2p_client_metric.cpp
1314
client_metric.cpp
1415
types.cpp
1516
master_client.cpp

mooncake-store/src/client_metric.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ ClientMetric::ClientMetric(uint64_t interval_seconds,
6767

6868
ClientMetric::~ClientMetric() { StopMetricsReportingThread(); }
6969

70+
bool ClientMetric::IsEnabled() { return parseMetricsEnabled(); }
71+
7072
std::unique_ptr<ClientMetric> ClientMetric::Create(
7173
std::map<std::string, std::string> labels) {
7274
if (!parseMetricsEnabled()) {
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
#include "p2p_client_metric.h"
2+
3+
namespace mooncake {
4+
5+
P2PClientMetric::P2PClientMetric(std::map<std::string, std::string> labels)
6+
: local_get_requests("mooncake_p2p_local_get_requests_total",
7+
"Total number of local Get requests", labels),
8+
local_get_hits("mooncake_p2p_local_get_hits_total",
9+
"Total number of local Get hits (found in local storage)",
10+
labels),
11+
local_get_misses("mooncake_p2p_local_get_misses_total",
12+
"Total number of local Get misses (not found locally)",
13+
labels),
14+
local_get_failures("mooncake_p2p_local_get_failures_total",
15+
"Total number of failed local Get requests", labels),
16+
local_get_bytes("mooncake_p2p_local_get_bytes_total",
17+
"Total bytes read by local Get", labels),
18+
local_get_latency("mooncake_p2p_local_get_latency_us",
19+
"Local Get latency (us)", kLatencyBucket, labels),
20+
local_put_requests("mooncake_p2p_local_put_requests_total",
21+
"Total number of local Put requests", labels),
22+
local_put_failures("mooncake_p2p_local_put_failures_total",
23+
"Total number of failed local Put requests", labels),
24+
local_put_bytes("mooncake_p2p_local_put_bytes_total",
25+
"Total bytes written by local Put", labels),
26+
local_put_latency("mooncake_p2p_local_put_latency_us",
27+
"Local Put latency (us)", kLatencyBucket, labels) {}
28+
29+
void P2PClientMetric::serialize(std::string& str) {
30+
local_get_requests.serialize(str);
31+
local_get_hits.serialize(str);
32+
local_get_misses.serialize(str);
33+
local_get_failures.serialize(str);
34+
local_get_bytes.serialize(str);
35+
local_get_latency.serialize(str);
36+
local_put_requests.serialize(str);
37+
local_put_failures.serialize(str);
38+
local_put_bytes.serialize(str);
39+
local_put_latency.serialize(str);
40+
}
41+
42+
std::string P2PClientMetric::summary_metrics() {
43+
std::stringstream ss;
44+
ss << "=== P2P Local Storage Metrics ===\n";
45+
46+
ss << "Local Get: " << local_get_requests.value() << " requests, "
47+
<< local_get_hits.value() << " hits, " << local_get_misses.value()
48+
<< " misses, " << local_get_failures.value() << " failures, "
49+
<< byte_size_to_string(local_get_bytes.value()) << " read\n";
50+
51+
ss << "Local Put: " << local_put_requests.value() << " requests, "
52+
<< local_put_failures.value() << " failures, "
53+
<< byte_size_to_string(local_put_bytes.value()) << " written\n";
54+
55+
return ss.str();
56+
}
57+
58+
} // namespace mooncake

0 commit comments

Comments
 (0)