-
Notifications
You must be signed in to change notification settings - Fork 379
Description
Package Name
No response
Package Version(s)
No response
Describe the feature you'd like
We are using a NestJS app and bullmq for jobs. Recently, dd-trace added first-class support for bullmq, which is great because this means we no longer have to inject the datadog trace headers into the jobs manually anymore.
However, now we face a different issue: There are too many spans in one trace!
Here is an example:
- We have cron job that triggers via a scheduled BullMQ job.
- This job will fetch the DB for entries to process and create one job per entry. There can be thousands of entries.
- We then create a new BullMQ job for each fetched entry. To not overload the system, the queue is throttled quite heavily.
Now the issue:
- The cron job starts and creates a new trace (or worse, inherits it from whatever put the cron job into the queue).
- We push a sub-job into the processing queue. The job will get the current (cron job's) trace context injected into its data.
- Repeat for around 10_000 jobs.
This means we can have traces in our trace view that live for hours and contain thousands of spans. This makes actually doing trace analysis very hard or even impossible.
What have done previously is skip our custom DD context injection for those tasks. Now, there seems to be no way to prevent it. I do not want to revert to our custom solution, though, since in my opinion this should anyway be the task of the lib, not our business logic.
What I would image, is something like
tracer.runOutsideContext(async () => {
// No tracing will be done here
});In my opinion, this could be a versatile addition to the API, since sometimes you can't control if a function will cause traces or not.
Is your feature request related to a problem?
We are unable to properly filter traces to help us debug/understand issues or workflows.
Describe alternatives you've considered
I did find a workaround for now, though it is rather ugly. If Datadog traces should be skipped, I define a custom toJSON on the data, which excludes the _datadog field:
private prepareJobData(options: Options | undefined, data: object): object {
if (options && options.connectDatadogTraces === false) {
// Prevent DataDog trace propagation. This is pretty ugly, but it works.
// And as far as I could figure out, there (currently) is no better way to
// do this.
// Custom JSON serializer, that will exclude the DataDog trace info.
(data as any).toJSON = excludeDatadogTraces;
}
return data;
}
function excludeDatadogTraces(this: any) {
const { _datadog: _, ...sansDatadog } = this;
return sansDatadog;
}This util is called every time before we create BullMQ jobs.
Additional context
On a side note: Injecting the _datadog field into every job's data would have nearly caused an issue for us. To prevent malformed jobs in the queue, we will use zod to validate each job's data before it is passed to the handler. This is especially important, in case we push a new job schema into the queue, then we want the old instances to reject this data and let it get handled by the new instances.
We had some discussion internally, if for this reason our job schemas should be "strict", i.e. reject objects that have additional, unexpected fields, as this would indicate that a new version was deployed with a new schema. We did decide against it, so the objects with the _datadog fields were still accepted and consumed.
If we would have had the strict flag set, we might have run into serious trouble. Since all our job processor would reject the jobs after updating Datadog.
Maybe it would make sense to remove the _datadog field from the job.data before passing it along to the consumer? A simple delete job.data['_datadog'] would do the trick.
And yes, I will reference the appropriate XKCD comic for myself 😄