Skip to content

feat: Refactor the log library of milvus, support scoped context logging#47610

Open
chyezh wants to merge 8 commits intomilvus-io:masterfrom
chyezh:issue-35917
Open

feat: Refactor the log library of milvus, support scoped context logging#47610
chyezh wants to merge 8 commits intomilvus-io:masterfrom
chyezh:issue-35917

Conversation

@chyezh
Copy link
Contributor

@chyezh chyezh commented Feb 6, 2026

issue: #35917
design doc: milvus-io/milvus-design-docs#14

Summary

This PR implements the mlog package - a context-aware logging library built on
https://github.com/uber-go/zap, designed specifically for Milvus distributed systems.

Key Features

  • Mandatory Context Passing - All logging operations require a context, ensuring request traceability
  • Zero-Overhead Abstraction - Uses type aliases to avoid wrapper overhead, performance comparable to
    direct zap usage
  • Automatic Field Accumulation - Context fields automatically accumulate through the call chain, child
    contexts inherit parent fields
  • Cross-Service Propagation - Supports propagating key fields via gRPC metadata for distributed tracing
  • Lazy Encoding - Uses WithLazy for deferred field encoding, avoiding encoding overhead when log level
    is disabled
  • Component-Level Logger - Provides optimized Logger type that selects the logger with more pre-encoded
    fields to minimize runtime encoding

@chyezh chyezh added this to the 3.0 milestone Feb 6, 2026
@sre-ci-robot sre-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines. label Feb 6, 2026
@sre-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chyezh
To complete the pull request process, please assign jiaoew1991 after the PR has been reviewed.
You can assign the PR to them by writing /assign @jiaoew1991 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mergify
Copy link
Contributor

mergify bot commented Feb 6, 2026

@chyezh Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

@mergify mergify bot added the needs-dco DCO is missing in this pull request. label Feb 6, 2026
@mergify
Copy link
Contributor

mergify bot commented Feb 6, 2026

@chyezh This is a feature PR (feat:). Please provide a design document.

How to resolve:
Link a design doc in the PR description:

design doc: https://github.com/milvus-io/milvus-design-docs/blob/main/design_docs/your_design.md

Design documents location: https://github.com/milvus-io/milvus-design-docs/tree/main/design_docs

@mergify mergify bot added do-not-merge/missing-design-doc kind/feature Issues related to feature request from users labels Feb 6, 2026
@sre-ci-robot
Copy link
Contributor

[ci-v2-notice]
Notice: New ci-v2 system is enabled for this PR.

To rerun ci-v2 checks, comment with:

  • /ci-rerun-code-check // for ci-v2/code-check
  • /ci-rerun-build // for ci-v2/build
  • /ci-rerun-ut-integration // for ci-v2/ut-integration, will rerun ci-v2/build
  • /ci-rerun-ut-go // for ci-v2/ut-go, will rerun ci-v2/build
  • /ci-rerun-ut-cpp // for ci-v2/ut-cpp
  • /ci-rerun-ut // for all ci-v2/ut-integration, ci-v2/ut-go, ci-v2/ut-cpp, will rerun ci-v2/build
  • /ci-rerun-e2e-arm // for ci-v2/e2e-arm
  • /ci-rerun-e2e-default // for ci-v2/e2e-default

If you have any questions or requests, please contact @zhikunyao.

@mergify mergify bot added dco-passed DCO check passed. and removed needs-dco DCO is missing in this pull request. labels Feb 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

1 similar comment
@github-actions
Copy link
Contributor

github-actions bot commented Feb 6, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 89.60843% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.86%. Comparing base (e0b4376) to head (1146a07).
⚠️ Report is 164 commits behind head on master.

Files with missing lines Patch % Lines
pkg/mlog/logger.go 74.75% 43 Missing and 9 partials ⚠️
pkg/mlog/rated.go 92.43% 6 Missing and 3 partials ⚠️
pkg/mlog/context.go 92.68% 4 Missing and 2 partials ⚠️
pkg/mlog/level.go 75.00% 2 Missing ⚠️

❌ Your project check has failed because the head coverage (76.86%) is below the target coverage (77.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #47610      +/-   ##
==========================================
- Coverage   76.98%   76.86%   -0.13%     
==========================================
  Files        2057     2062       +5     
  Lines      337572   336832     -740     
==========================================
- Hits       259894   258906     -988     
- Misses      69566    69760     +194     
- Partials     8112     8166      +54     
Components Coverage Δ
Client 78.89% <ø> (ø)
Core 83.38% <ø> (ø)
Go 75.08% <89.60%> (-0.20%) ⬇️
Files with missing lines Coverage Δ
pkg/mlog/field.go 100.00% <100.00%> (ø)
pkg/mlog/field_enum.go 100.00% <100.00%> (ø)
pkg/mlog/interceptor.go 100.00% <100.00%> (ø)
pkg/mlog/level.go 75.00% <75.00%> (ø)
pkg/mlog/context.go 92.68% <92.68%> (ø)
pkg/mlog/rated.go 92.43% <92.43%> (ø)
pkg/mlog/logger.go 74.75% <74.75%> (ø)

... and 58 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Feb 6, 2026
chyezh and others added 6 commits February 25, 2026 10:51
Implement the mlog package - a context-aware logging library built on zap,
designed specifically for Milvus distributed systems.

Key Features:
- Mandatory context passing for all logging operations
- Zero-overhead abstraction using type aliases
- Automatic field accumulation through call chain
- Cross-service field propagation via gRPC metadata
- Lazy encoding with WithLazy for deferred field encoding
- Component-level Logger with field optimization
- OpenTelemetry TraceID/SpanID extraction

Package Structure:
- pkg/mlog/logger.go: Global logging functions and component-level Logger
- pkg/mlog/context.go: Context field management and caching
- pkg/mlog/field.go: Field constructors (type aliases to zap)
- pkg/mlog/field_enum.go: Well-known keys and FieldXXX helpers
- pkg/mlog/level.go: Dynamic log level management
- pkg/mlog/grpc/interceptor.go: gRPC interceptors for field propagation

Test Coverage: 100% (115 tests)

Related Issue: milvus-io#35917

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Add RatedDebug/Info/Warn/Error and RatedLog functions with per-call-site
rate limiting using lazy-initialized token-bucket limiters. Suppressed
entries are tracked and reported via _ignored field on next allowed log.
Also expose Logger.log as Logger.Log for public use.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Replace intermediate log()/logWithLogger()/ratedLog() with prepareLog()
that returns logger+fields before the zap call. Each public function now
directly calls the zap method, ensuring exactly 1 frame between user
code and zap's caller capture. AddCallerSkip(1) now correctly reports
the user's call site for all 14 API paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
Add comprehensive benchmark suite comparing mlog vs native zap.Logger
across all API paths (package-level, Logger methods, rated functions).
Update README with benchmark tables, rate-limited API docs, and key
performance takeaways.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
- Add rated.go and Log() to architecture diagram
- Fix lowercase `log` to uppercase `Log` in performance example
- Add missing Log() to Logger Methods table
- Add Uintptrp/Uintptrs to field constructors table
- Add GetAtomicLevel() to API reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
…, and merge grpc interceptors

- Add DPanic, Panic, Fatal log level functions (package-level and Logger methods)
- Add FieldOption/OptPropagated() for opt-in RPC field propagation on well-known fields
- Add well-known field keys: index_id, field_id, task_id, broadcast_id, job_id, build_id
- Move gRPC interceptors from pkg/mlog/grpc/ into pkg/mlog/ package
- Privatize propagatedStringField/propagatedInt64Field and all key constants

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
…gent guide

- Fix prepareLog to call getLogContext once instead of twice, removing 2
  unnecessary heap allocations per log call (MlogInfo: 2 allocs → 0,
  MlogInfoWithFields: 3 allocs → 1)
- Replace per-call &logContext{} with shared emptyLogContext sentinel
- Remove unused loggerFromContext function
- Add LevelEnabled() global function and (*Logger).LevelEnabled() method
  for guarding expensive field construction on hot paths
- Remove caller-related benchmarks (ZapInfoWithCaller, ZapInfoWithCallerAndFields)
- Update README.md with new benchmark data and LevelEnabled API docs
- Add README_AGENT.md as a concise AI Agent logging guide

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
@mergify
Copy link
Contributor

mergify bot commented Feb 25, 2026

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Feb 25, 2026
- Replace nil context with context.Background() in all tests
- Replace "errors" import with "github.com/cockroachdb/errors" (depguard)
- Fix import grouping order for gci linter
- Remove nil-context benchmark (BenchmarkMlogInfoNilContext)
- Update stale comment referencing removed loggerFromContext function

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: chyezh <chyezh@outlook.com>
@mergify
Copy link
Contributor

mergify bot commented Feb 25, 2026

@chyezh go-sdk check failed, comment rerun go-sdk can trigger the job again.

@sre-ci-robot sre-ci-robot added low-code-coverage add test-label from zhikun, diff coverage > 80% and removed low-code-coverage add test-label from zhikun, diff coverage > 80% labels Feb 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-passed DCO check passed. do-not-merge/missing-design-doc kind/feature Issues related to feature request from users size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants