Skip to content

Conversation

@kalleep
Copy link
Contributor

@kalleep kalleep commented Dec 8, 2025

PR Description

loki.source.docker suffered from the same issue that loki.source.file did where scheduling could take some time for new targets.

In this pr I move Scheduler to source package so I can be reused by both components. With this I also moved "target" into docker package and rename to tailer, this tailer now implements Source interface.

I also fixed a issue where stopping component could deadlock if nothing was reading from handler chan.

Which issue(s) this PR fixes

Related to: #4729

Notes to the Reviewer

  • Moved target and metrics from internal package to tailer.go and metrics.go and they are no longer exported.

  • Created a shared structure Fanout, this implements the common pattern to abort send operation if context is canceled.

  • Create shared function Consume to run the consume loop that will abort if context is canceled.

  • Create shared function Drain that can be used when component is stopped.

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@kalleep kalleep requested a review from a team as a code owner December 8, 2025 13:16
@kalleep kalleep force-pushed the kalleep/loki-source-docker-scheduling branch from 18c614f to bf16415 Compare December 8, 2025 13:17
@kalleep kalleep requested a review from Copilot December 9, 2025 12:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors loki.source.docker to use a shared Scheduler pattern (previously only used by loki.source.file) to improve container scheduling performance and fix potential deadlocks during component shutdown. The scheduler and related utilities have been moved to the shared source package for reuse across components.

Key changes:

  • Moved Scheduler from file package to shared source package, making it reusable by both loki.source.file and loki.source.docker
  • Created new shared utilities (Fanout, Consume, Drain) that implement common patterns for entry forwarding and graceful shutdown
  • Replaced the docker component's custom manager/runner implementation with the Scheduler pattern

Reviewed changes

Copilot reviewed 17 out of 19 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
internal/component/loki/source/scheduler.go Package renamed from file to source; removed IsRunning() method; added DebugSource interface
internal/component/loki/source/scheduler_test.go Package renamed from file to source
internal/component/loki/source/drain.go New shared utility for draining log entries during shutdown to prevent deadlocks
internal/component/loki/source/consume.go New shared utility for consuming and forwarding log entries with context cancellation
internal/component/common/loki/fanout.go New shared utility for distributing log entries to multiple receivers with thread-safe updates
internal/component/loki/source/file/file.go Updated to use shared scheduler, fanout, consume, and drain utilities; removed custom receiver management
internal/component/loki/source/file/tailer.go Removed IsRunning() method; added DebugInfo() method to implement DebugSource interface
internal/component/loki/source/file/tailer_test.go Updated test to access running field directly instead of through removed IsRunning() method
internal/component/loki/source/file/decompresser.go Removed IsRunning() method; added DebugInfo() method to implement DebugSource interface
internal/component/loki/source/docker/docker.go Major refactoring to use scheduler pattern; replaced manager with Scheduler; uses shared Fanout for receiver management
internal/component/loki/source/docker/tailer.go Moved from internal/dockertarget package; implements Source and DebugSource interfaces; added Run() method for scheduler integration
internal/component/loki/source/docker/tailer_test.go Updated tests for new tailer structure; added restart and stress tests
internal/component/loki/source/docker/metrics.go Changed from exported Metrics to unexported metrics; package comment updated
internal/component/loki/source/docker/runner.go Deleted (replaced by scheduler)
internal/component/loki/source/docker/internal/dockertarget/target.go Deleted (moved to tailer.go)
internal/component/loki/source/docker/docker_test.go Updated component tests to work with new scheduler-based implementation
internal/component/loki/source/docker/testdata/flog_after_restart.log New test data file for container restart tests
CHANGELOG.md Added entries for scheduling improvement and deadlock fix
Comments suppressed due to low confidence (1)

internal/component/loki/source/scheduler.go:104

  • Inconsistent type parameter naming: The generic type parameter is lowercase k in DebugSource[k comparable], but it's uppercase K in all other generic types in this file (e.g., Source[K comparable], Scheduler[K comparable], SourceWithRetry[K comparable]). For consistency, it should be DebugSource[K comparable].

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

@kalleep kalleep force-pushed the kalleep/loki-source-docker-scheduling branch from a8b6fa7 to 627a9cc Compare December 9, 2025 16:15
Copy link
Contributor

@dehaansa dehaansa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot to process here! A first pass at review, but will definitely take another.

@kalleep kalleep force-pushed the kalleep/loki-source-docker-scheduling branch 3 times, most recently from 257e395 to 9eae721 Compare December 11, 2025 14:17
Copy link
Contributor

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks correct, but I do feel like it's a bit of a risky change. I know you've done manual testing, so I'm approving this, but another pair of eyes will be a good idea.

Comment on lines 90 to +103
func (s *Scheduler[K]) Stop() {
s.cancel()
s.running.Wait()
s.sources = make(map[K]scheduledSource[K])
}

// Reset will stop all running sources and wait for them to finish and reset
// Scheduler to a usable state.
func (s *Scheduler[K]) Reset() {
s.cancel()
s.running.Wait()
s.sources = make(map[K]scheduledSource[K])
s.ctx, s.cancel = context.WithCancel(context.Background())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are quite similar... Can we reduce to have just one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can think of a better way to do it. but we don't want to create a new context when we stop a component, that would leak resources..


func (d *decompressor) IsRunning() bool {
return d.running.Load()
func (d *decompressor) DebugInfo() any {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another gotcha with DebugInfo functions in the past was that the lock order was not consistent and we had a deadlock. Happened at least 2x. I looked and I don't see this issue here, but the fact that we need to depend on manual review worries me here. Maybe we can refactor in the future to avoid the multiple locks sequencing issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that is a issue for sure

@kalleep
Copy link
Contributor Author

kalleep commented Dec 11, 2025

It looks correct, but I do feel like it's a bit of a risky change. I know you've done manual testing, so I'm approving this, but another pair of eyes will be a good idea.

Sure I can wait for @dehaansa do to a review too.

@kalleep kalleep force-pushed the kalleep/loki-source-docker-scheduling branch from 2f46b74 to 797a5ae Compare December 11, 2025 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants