Skip to content

AmirhosseinHonardoust/Decision-Safety

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Decision Safety

Article Topic Focus Includes Audience Level Reading%20Time

The missing layer between dashboards and decisions

A true story

A team sees a dip in conversions.

Not a small dip. A real dip. Big enough that people stop what they’re doing.

Someone opens the KPI dashboard. The chart is clean. The line goes down. The alert is red. The comments start:

  • “We should pause the campaign.”
  • “This is a product regression.”
  • “Let’s roll back the release.”

By noon, the team has reacted. Spend is paused. A feature flag is flipped. A post-mortem doc is already started.

By evening, someone finds the real issue:
A data pipeline change introduced an inner join that quietly dropped a segment.
Conversions didn’t fall. Observation fell.

No one lied. No one was incompetent.
The dashboard was working as designed. The decision was not.

The failure mode wasn’t “bad analytics”.

The failure mode was unsafe analytics.


The problem with analytics today

Most teams treat analytics like reporting.

  • “Is the dashboard up?”
  • “Did the query run?”
  • “Do we have numbers?”

But decisions are not reports. Decisions are interventions. They have cost.

If your dashboard can be wrong in ways that look believable and you don’t measure that risk, then you don’t have analytics. You have a decision hazard.

This is where Decision Safety comes in.


What is Decision Safety?

Decision Safety is the minimum trust threshold a metric must meet before it’s allowed to influence decisions.

It answers a simple question:

“If we act on this metric today, how likely is it that we would make the same decision if the data were perfectly observed?”

Notice the wording:

  • It’s not “is it accurate?”
  • It’s not “is the pipeline green?”
  • It’s not “is there missing data?”

It’s: is it safe to act?

Because the worst analytics failures don’t look like errors.
They look like insight.


The key idea: stop shipping dashboards, start shipping decision systems

Dashboards are outputs. Decisions are consequences.

A decision system needs safety constraints:

  • guardrails
  • thresholds
  • pre-flight checks
  • clear “do not use” conditions

We already do this in software engineering:

  • unit tests prevent shipping broken code
  • feature flags prevent risky rollouts
  • monitoring prevents silent failures

But analytics?
Analytics often ships with one test:

  • “does the query return rows?”

That’s not safety. That’s survival.

Decision Safety is the missing layer:
a trust gate between metrics and action.


The 4 pillars of Decision Safety

Decision Safety becomes practical when you break it into four measurable pillars:

  1. Coverage
  2. Freshness
  3. Stability
  4. Measurement Risk

If you measure these four consistently, you can turn “trust” into something visible, auditable, and enforceable.

Let’s go one by one.


1) Coverage: are we observing what we think we’re observing?

Coverage is the most common silent killer because it creates convincing lies.

If 20% of your data disappears, your dashboards may still look smooth.
But the conclusions become wrong.

Coverage has two forms

A) Time coverage

Do we have all expected time windows?

Examples:

  • yesterday’s data is incomplete but still shown
  • the last 3 hours are missing due to ingestion lag
  • a timezone shift caused partial-day duplication

B) Entity coverage

Do we have all expected segments/entities?

Examples:

  • one country dropped
  • one platform stopped sending events
  • one payment provider’s data is missing
  • one marketing channel is excluded by a filter

The dangerous part

Coverage failures don’t always look like missingness.

Sometimes the data is “complete” relative to the pipeline but incomplete relative to reality.

A KPI can be 100% non-null and still be 60% coverage.

Coverage checks that matter

  • expected number of partitions present (by day/hour)
  • distinct segment count over time (country/platform/channel)
  • event volume ratio by segment (today vs trailing baseline)
  • join coverage (left table rows vs joined result rows)

Coverage rule (practical)

If you only adopt one rule from this article, adopt this:

Never interpret a KPI without a coverage indicator next to it.

Coverage should be shown like a battery icon:

  • 100%: good
  • 70%: warning
  • 40%: dangerous

2) Freshness: is the data recent enough for the decision?

Freshness isn’t about whether data exists. It’s about whether it’s usable for this decision context.

A dashboard refreshed daily might be fine for monthly strategy.
It’s unsafe for incident response.

Freshness is decision-relative

The same metric can be:

  • safe for quarterly reporting
  • unsafe for real-time experimentation

Because the decision requirements differ.

Freshness checks that matter

  • max event timestamp observed
  • lag distribution (now - latest event time)
  • pipeline runtime success time
  • SLA compliance (did we meet the expected update window?)

Freshness rule (practical)

Define freshness tiers:

  • Green: within SLA
  • Yellow: outside SLA but within tolerance
  • Red: stale enough to invalidate decisions

If you don’t define this, you will always “just use what you have”, which is how unsafe decisions happen.


3) Stability: does yesterday’s truth change tomorrow?

Stability is the most under-measured dimension of analytics trust.

Many datasets backfill:

  • refunds post later
  • late events arrive
  • attribution updates
  • warehouses reprocess

So the metric you saw yesterday might not be the metric you’ll see next week.

Stability is about revision

If numbers move after “finalization,” then:

  • trend comparisons are unstable
  • experiment results can flip
  • stakeholder trust erodes quietly

Stability checks that matter

  • revision rate: how much does a day’s metric change after 24h, 72h, 7d?
  • late-arrival rate: what fraction of events arrive late?
  • backfill volume: how often are partitions rewritten?
  • “finalization window”: when does a day become stable?

Stability rule (practical)

For every KPI, define:

  • When is it final?
  • How much can it change after final?
  • What is the revision policy?

If you cannot answer these, your KPI is not decision-safe for anything time-sensitive.


4) Measurement Risk: are we measuring what we think we’re measuring?

Measurement Risk is where analytics fails even when data is “complete”.

Coverage can be perfect and you still make unsafe decisions because the metric itself is biased or ambiguous.

Examples:

  • “active users” definition changed
  • instrumentation counts only certain platforms
  • survey scale forces people upward
  • conversion is measured as pageview, not purchase

Measurement Risk comes from:

  • definition ambiguity
  • instrumentation gaps
  • incentives (people gaming the metric)
  • proxy metrics (measuring what’s easy, not what matters)
  • selection bias (who appears in the data at all)

Measurement Risk checks that matter

  • definition versioning (did metric definition change?)
  • instrument parity (web vs mobile event parity)
  • funnel integrity (step transitions make sense)
  • proxy validation (correlation to the true outcome)
  • guardrails against gaming

Measurement Risk rule (practical)

Every metric must have:

  • a definition
  • known failure modes
  • a list of assumptions
  • a version history

If your metric has no written definition, it’s not a metric.
It’s a rumor with a chart.


The Decision Safety Score (DSS)

Now we turn the framework into something operational.

A simple scoring system works because it gives a shared language:

  • “This KPI is red, don’t act on it.”
  • “This decision is yellow, add caveats.”
  • “Green is safe to execute.”

DSS: a simple 0–100 score

Give each pillar up to 25 points:

  • Coverage: 0–25
  • Freshness: 0–25
  • Stability: 0–25
  • Measurement Risk: 0–25

Total = 0–100.

DSS thresholds (example)

  • Green (80–100): safe to use for decisions
  • Yellow (60–79): use with explicit caveats; avoid irreversible actions
  • Red (<60): block decisions; metric can be displayed but should not drive action

Why a score works

Because “trust” is otherwise negotiated socially:

  • the loudest person wins
  • the most confident person wins
  • the person with status wins

A score makes trust auditable:

  • explain why it’s red
  • show what to fix
  • track improvements over time

How to score each pillar (practical rules you can copy)

You don’t need perfect math. You need consistent criteria.

Coverage (0–25)

Start with expected coverage: segments × time.

Example scoring:

  • 25: all expected segments present; time partitions complete
  • 18: minor drops in low-impact segments
  • 10: major segment missing or large join drop
  • 0: coverage unknown or obviously broken

Coverage red flags

  • join reduces rows by >X% unexpectedly
  • one segment drops to near zero
  • distinct user count drops but sessions do not (or vice versa)

Freshness (0–25)

Example scoring:

  • 25: within SLA
  • 15: slightly late but still within decision tolerance
  • 5: stale; likely incomplete
  • 0: freshness unknown

Stability (0–25)

You can estimate stability by comparing snapshots:

  • how much does yesterday’s number change tomorrow?

Example scoring:

  • 25: <1% revision after 24h
  • 15: 1–5% revision
  • 5: 5–15% revision
  • 0: unstable or unknown revision behavior

Measurement Risk (0–25)

This is the “qualitative but structured” part.

Example scoring:

  • 25: clear definition; instrumentation validated; known assumptions; versioning tracked
  • 15: definition exists but instrumentation parity unknown
  • 5: proxy metric or frequent definition changes
  • 0: undefined metric or known mismeasurement

“Block the decision, not the dashboard”

This is a core concept of Decision Safety.

Dashboards are allowed to show numbers even when unsafe.
But decisions should be blocked based on DSS.

Because stakeholders may still want visibility:

  • to debug pipelines
  • to communicate issues
  • to track recovery

But you should not allow a metric to:

  • trigger rollbacks
  • change budgets
  • end experiments
  • fire alerts without safety checks

Example policy

  • Dashboard can display KPI always.
  • Alerts can fire only when DSS ≥ 80.
  • Decisions require DSS ≥ 80, or an explicit override note.

That override note matters. It creates accountability:

  • “We acted on a yellow metric because …” This makes unsafe behavior visible instead of normalized.

Decision Safety as a “pre-flight checklist”

This is the mindset shift.

Before you act on a metric, you run pre-flight checks:

  • coverage OK?
  • freshness OK?
  • stability OK?
  • measurement known?

If any are red, you don’t “interpret harder”. You fix the measurement.

The hidden benefit

Decision Safety reduces wasted work.

Without it:

  • teams chase phantom regressions
  • experiments “fail” due to tracking gaps
  • stakeholders lose trust in analytics

With it:

  • you stop reacting to noise
  • you reduce false alarms
  • you make analytics predictable

The 10 most common decision-unsafe failure modes (and what they look like)

These are the patterns you should train yourself to recognize.

  1. Segment drop disguised as KPI drop
  • Country/platform/channel disappears → KPI collapses
  1. Inner join that deletes reality
  • new join logic turns observations into a subset
  1. Late-arriving data creates false dips
  • today looks bad; tomorrow it “recovers” magically
  1. Metric definition drift
  • “Active” changes but chart continues
  1. Timezone boundary errors
  • day-to-day comparisons break
  1. Instrumentation parity mismatch
  • web tracked, mobile not (or vice versa)
  1. Sampling or client-side loss
  • ad blockers, privacy settings, offline usage
  1. Attribution model changes
  • conversions “move” between channels
  1. Survey measurement forcing
  • scale misses “Never”, pushes responses upward
  1. Data freshness mismatch
  • dashboard updates daily; decision required hourly

The dangerous part is not that these happen.
The dangerous part is that they happen silently.

Decision Safety is the system to make them loud.


What Decision Safety is NOT

Decision Safety is not:

  • perfection
  • “zero errors”
  • “a magical trust score”

Decision Safety is:

  • a practical gate
  • a shared vocabulary
  • a set of checks that prevent known failure modes

It doesn’t eliminate uncertainty.
It prevents acting like uncertainty doesn’t exist.


The Decision Safety Contract (copy-paste template)

If you want a single artifact that upgrades your analytics maturity fast, it’s this:

Metric: ____________________

Purpose: why it exists and what decision it supports
Definition: plain language definition + formula
Grain: user/day? session/day? transaction?
Refresh SLA: expected update time
Finalization window: when it becomes stable
Coverage requirements: expected segments + partitions
Known failure modes: what commonly breaks it
Owner: who is responsible
DSS thresholds: green/yellow/red rules
Allowed decisions: what you can do when green
Blocked decisions: what you must not do when yellow/red
Last definition change: date + reason

This is boring. That’s why it works.
Boring is safe.


Decision Safety: a practical checklist

Use this before you act on any dashboard.

Coverage

  • Do all expected segments exist today?
  • Did any segment drop to near zero?
  • Did joins drop rows unexpectedly?
  • Do today’s counts match typical ranges?

Freshness

  • Is the latest event timestamp recent enough?
  • Are we within SLA for this decision?
  • Is the “current day” complete or partial?

Stability

  • Do numbers backfill?
  • How much does yesterday change after 24h?
  • Are we comparing finalized windows?

Measurement Risk

  • Do we have a written definition?
  • Did the definition change recently?
  • Is instrumentation consistent across platforms?
  • Is this a proxy metric? If yes, is it validated?

If you can’t answer these, you are not blocked by lack of data.
You’re blocked by lack of safety.


“Decision Safety” is how you become a senior analyst

Senior analytics is not about:

  • more charts
  • more SQL tricks
  • more dashboards

Senior analytics is about:

  • preventing wrong decisions
  • making trust measurable
  • designing systems that fail safely

When you introduce Decision Safety to a team, you upgrade analytics from “reporting” to “engineering”.

And the best part is:

  • you don’t need perfect data
  • you need a consistent safety mindset

If you only remember one sentence

A dashboard can be correct and still be unsafe.

Decision Safety is what turns “numbers” into decisions you can stand behind.

About

Long-form article introducing Decision Safety: a trust gate between dashboards and actions. Defines four pillars (coverage, freshness, stability, measurement risk), proposes a Decision Safety Score (0–100), shows common failure modes, and includes a copy-paste “Decision Safety Contract”+checklist to block unsafe decisions without hiding dashboards.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors