Utilities

Alert Noise Score

Score your alerting hygiene from 0-100 across noise rate, flap rate, and after-hours skew. Get prioritized recommendations to cut alert fatigue and reclaim your on-call rotation.

Distinct rule names. Helps detect if a few noisy alerts dominate.

Every alert / page / notification fired in the period

Alerts where someone took a remediation action

Alerts that fired and auto-cleared without intervention

Alerts fired outside business hours (10pm-8am, weekends)

Noise Score

65/100

D — Heavy noise

Composite of noise% (50%), flap rate (30%), and after-hours skew (20%). Lower is better.

Noise rate

89%

372 non-actionable

Flap rate

43%

180 self-resolved

After-hours

38%

160 off-hours pages

Recommendations

Most alerts are not actionable

89% of alerts in this period required no action. Audit the top alerting rules and either delete them or raise the threshold so they fire only when something actually needs to be done.

High flap rate

43% of alerts auto-resolved without intervention. Add or extend the "for:" clause on Prometheus rules (typically 5-15 min) so transient blips do not page. Consider switching to multi-window burn rate alerts — see the Burn Rate Calculator.

Repeat offenders concentrated

Average 12.0 fires per unique alert. A small number of rules likely produce most of your noise — sort alerts by name, fix the top 5 worst, and the overall page volume usually drops 50%+.

Automate your incident response

Reduce MTTR by 90% with AI-powered root cause analysis. Free to start.

Try Uptimes.ai Free

Why alert noise is the metric that matters

Most engineering orgs measure alert volume — pages per week, MTTA, MTTR. None of those numbers tell you whether the alerts were worth firing in the first place. Alert noise — the fraction of alerts that were not actionable — is the leading indicator for alert fatigue, on-call attrition, and incidents missed because everyone tuned out the noise.

The 80/20 rule of alert hygiene

In every team we have audited, a small number of alert rules produce most of the noise. Sort your alerts by name over the last 30 days and look at the top 5. For each, ask:

  • Does this rule fire on a transient blip without a for: clause?
  • Was the threshold copied from a tutorial without tuning to this service?
  • When this fires, does anyone do anything? Or do we ack and move on?
  • Could this be a daily digest instead of a real-time page?

Killing or fixing the top 5 worst alerts typically cuts overall page volume by 40-60% with no negative effect on incident detection.

From measurement to automation

Cutting alert noise manually has a ceiling. Even a well-tuned rule set produces correlated alerts when something fails: a database goes down and 30 dependent services all alert simultaneously. Uptimes.ai automatically correlates these alert storms into a single incident, and the AI SRE agent investigates root cause across all signals before paging anyone. Most customers see another 50% reduction on top of whatever manual tuning they have done.

Frequently Asked Questions

How is the score calculated?+
The composite score weights three factors: noise rate (50%) — the percentage of alerts that did not require action; flap rate (30%) — the percentage that auto-resolved without intervention; and after-hours skew (20%) — the percentage fired outside business hours. Higher number = more noise. Below 25 is healthy; 50-65 is significant noise; above 80 means your team is in alert fatigue territory.
What counts as an "actionable" alert?+
An alert is actionable if a human took at least one remediation step in response: ran a script, restarted a service, escalated to another team, opened a ticket, made a code change, or even confirmed it was a known false-positive. An alert is NOT actionable if the responder looked at it, decided nothing needed to be done, and went back to sleep — that alert should not have fired.
How do I find these numbers?+
In PagerDuty, look at "Notifications by status" reports — alerts that resolved without acknowledgment are usually flap. In Datadog, the alert review dashboard shows fire-resolved durations. In Prometheus alertmanager, the alert-fatigue exporter gives most of these. Or get them retroactively by exporting the last 30-90 days of alerts and labeling them yourself — even rough estimates produce a useful score.
My noise rate is high. What do I fix first?+
Sort alerts by name and look at the top 5 most-fired. In most teams, 5 alert rules produce 50%+ of the volume. For each: (1) is there a "for:" clause to filter transient blips? (2) is the threshold actually meaningful, or just an arbitrary number copied from a tutorial? (3) does anyone respond when this fires? Often the right answer is delete or convert to a daily digest instead of a page.
What is the connection to Uptimes.ai?+
Alert noise is the problem Uptimes.ai exists to solve at scale. Our platform automatically correlates and deduplicates alerts (typical reduction of 90-94%), and our AI SRE agent runs the first 30 minutes of investigation before paging a human. Customers commonly cut both alert volume and after-hours pages by ~50% within the first month. This tool helps you measure where you are starting from.