Alert Noise Score
Score your alerting hygiene from 0-100 across noise rate, flap rate, and after-hours skew. Get prioritized recommendations to cut alert fatigue and reclaim your on-call rotation.
Distinct rule names. Helps detect if a few noisy alerts dominate.
Every alert / page / notification fired in the period
Alerts where someone took a remediation action
Alerts that fired and auto-cleared without intervention
Alerts fired outside business hours (10pm-8am, weekends)
Noise Score
65/100
Composite of noise% (50%), flap rate (30%), and after-hours skew (20%). Lower is better.
Noise rate
89%
372 non-actionable
Flap rate
43%
180 self-resolved
After-hours
38%
160 off-hours pages
Recommendations
Most alerts are not actionable
89% of alerts in this period required no action. Audit the top alerting rules and either delete them or raise the threshold so they fire only when something actually needs to be done.
High flap rate
43% of alerts auto-resolved without intervention. Add or extend the "for:" clause on Prometheus rules (typically 5-15 min) so transient blips do not page. Consider switching to multi-window burn rate alerts — see the Burn Rate Calculator.
Repeat offenders concentrated
Average 12.0 fires per unique alert. A small number of rules likely produce most of your noise — sort alerts by name, fix the top 5 worst, and the overall page volume usually drops 50%+.
Automate your incident response
Reduce MTTR by 90% with AI-powered root cause analysis. Free to start.
Why alert noise is the metric that matters
Most engineering orgs measure alert volume — pages per week, MTTA, MTTR. None of those numbers tell you whether the alerts were worth firing in the first place. Alert noise — the fraction of alerts that were not actionable — is the leading indicator for alert fatigue, on-call attrition, and incidents missed because everyone tuned out the noise.
The 80/20 rule of alert hygiene
In every team we have audited, a small number of alert rules produce most of the noise. Sort your alerts by name over the last 30 days and look at the top 5. For each, ask:
- Does this rule fire on a transient blip without a
for:clause? - Was the threshold copied from a tutorial without tuning to this service?
- When this fires, does anyone do anything? Or do we ack and move on?
- Could this be a daily digest instead of a real-time page?
Killing or fixing the top 5 worst alerts typically cuts overall page volume by 40-60% with no negative effect on incident detection.
From measurement to automation
Cutting alert noise manually has a ceiling. Even a well-tuned rule set produces correlated alerts when something fails: a database goes down and 30 dependent services all alert simultaneously. Uptimes.ai automatically correlates these alert storms into a single incident, and the AI SRE agent investigates root cause across all signals before paging anyone. Most customers see another 50% reduction on top of whatever manual tuning they have done.