feat(orchestrator): Add dropped task metric for capped groups by pfreixes · Pull Request #5865 · NangoHQ/nango

pfreixes · 2026-04-15T10:35:29Z

Adds a new metric for knowing when we are silently dropping tasks. This can be useful during incidents.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 624cf4ead4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-15T10:39:23Z

        }
+        const droppedCountPerPrimitive = new Map<string, number>();
+        for (const [groupKey, droppedCount] of cappedGroupCounts.entries()) {
+            const primitive = groupKey.split(':')[0] || 'unknown';


Avoid high-cardinality tags in dropped-task metric

This derives the primitive tag from groupKey.split(':')[0], which means any group key without a colon is used verbatim as a metric tag value. In this codebase, group.key is accepted as an arbitrary string (packages/orchestrator/lib/routes/v1/postImmediate.ts, z.string().min(1)), so capped tasks for user-specific/random keys (e.g., UUID/nanoid-style keys) will emit one Datadog series per key and can quickly create unbounded cardinality. Please normalize to a bounded set (or fall back to unknown when the expected prefix format is absent) before incrementing ORCH_TASKS_DROPPED.

Useful? React with 👍 / 👎.

number of primitives is low cardinality, but thanks for the heads up.

TBonnin · 2026-04-15T12:38:41Z

                });
            } else {
-                cappedGroupKeys.add(props.groupKey);
+                cappedGroupCounts.set(props.groupKey, (cappedGroupCounts.get(props.groupKey) ?? 0) + 1);


not sure of the value of this metric to be honest. Dropping tasks is usually the result of some other problem upstream that we have plenty of signal for.
If you really want it though just increment here, no need to aggregate, DD lib will do it for you

I could be wrong here, but at least i had no visibility on this during the latest incident, why we could suspicious that with some proxy metrics there was no an explicit datapoint for this (as said I could be wrong here so happy to learn what other signals we have for this)

Also its not only about incidents, and is similar to the max backpressure signal that we added for knowing when customers were reaching to their max concurrency threshold, with this new metric we can pin point specific time windows for knowing if one or multiple customers got impacted (we will still require the logging part for framing specific issues for specific customers)

Note: just for clarification, upstream -> dependencies of our services, downstream -> users of our services? or the other way arround? just asking bc Ive seen using them in both directions, and is worth it to get all of us aligned)

Dropped tasks are the result of tasks being created faster than being executed, which we already have several metrics for. Knowing and understanding why tasks are created too fast and/or not executed fast enough is the valuable piece of information. Dropped tasks is basically the same information reported as another metric.

Dropped tasks are the result of tasks being created faster than being executed, which we already have several metrics for.

Sorry but I kinda disagree here, dropped tasks is the canonical source of truth, the other one is a proxy metric which you could infer that dropping is happening but without explicit datapoints - the ones provided by the dropped metric - its just an educated guess.

. Knowing and understanding why tasks are created too fast and/or not executed fast enough is the valuable piece of information.

Related to the previous one, you know that a customer is sending data too fast or that the system is going to slow when data is being discarded, with the creation rate or execution rate you have proxy metrics about that.

Yea knowing what is causing is fundamental but its more a secondary step IMO, first you require data for knowing that this is happening.

Also, tasks.dropped gives us an explicit measure of discarded work volume. Creation and execution rates are only proxy metrics for pressure. With a direct dropped-tasks metric, we could even monitor absolute drops or dropped percentage relative to created tasks.

for having the % we would require to have a total counter, this would be IMO ideal for paths like the schedule, were we can emit total tried to schedule vs dropped, this can help you to quantify the impact of an incident.

Happy to add also this counter in this PR

feel free to add the metric. Just let ddtrace do the aggregation.

I disagree that dropping tasks is the canonical source of truth. With it alone you cannot even infer if this is a 'too many creations' or a 'not enough executions' problem.

Add dropped task metric for capped groups

624cf4e

pfreixes marked this pull request as draft April 15, 2026 10:35

pfreixes marked this pull request as ready for review April 15, 2026 10:36

pfreixes changed the title ~~Add dropped task metric for capped groups~~ feat(orchestrator): Add dropped task metric for capped groups Apr 15, 2026

chatgpt-codex-connector bot reviewed Apr 15, 2026

View reviewed changes

TBonnin reviewed Apr 15, 2026

View reviewed changes

rossmcewan approved these changes Apr 17, 2026

View reviewed changes

pfreixes added this pull request to the merge queue Apr 17, 2026

Merged via the queue into master with commit 2724cf5 Apr 17, 2026
23 of 24 checks passed

pfreixes deleted the pfreixes/new-metrics branch April 17, 2026 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(orchestrator): Add dropped task metric for capped groups#5865

feat(orchestrator): Add dropped task metric for capped groups#5865
pfreixes merged 1 commit intomasterfrom
pfreixes/new-metrics

pfreixes commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Uh oh!

pfreixes Apr 15, 2026

Uh oh!

TBonnin Apr 15, 2026

Uh oh!

pfreixes Apr 15, 2026

Uh oh!

TBonnin Apr 15, 2026 •

edited

Loading

Uh oh!

pfreixes Apr 15, 2026

Uh oh!

pfreixes Apr 15, 2026

Uh oh!

pfreixes Apr 15, 2026

Uh oh!

TBonnin Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

pfreixes commented Apr 15, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

pfreixes Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

TBonnin Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

pfreixes Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

TBonnin Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pfreixes Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

pfreixes Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

pfreixes Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

TBonnin Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

TBonnin Apr 15, 2026 •

edited

Loading