Incident Management

Incident Severity Levels: A Complete Guide to Classifying and Responding to Incidents

StatusRay Team

July 18, 2025

11 min read

Last updated: February 10, 2026

Incident Severity Levels: A Complete Guide to Classifying and Responding to Incidents

Not every incident deserves the same response. A button misaligned on your settings page is not the same as your API returning 500 errors to every customer.

Incident severity levels give your team a shared language for classifying incidents by impact — so you respond to a full outage differently than a cosmetic bug, without debating priority in the middle of a crisis.

This guide covers how to define incident severity levels that work for growing SaaS teams, with a ready-to-use severity matrix you can copy and adapt today.

What Are Incident Severity Levels?

Incident severity levels are a classification system that categorizes incidents based on their impact on users and business operations. Think of it as triage: when multiple things break at once, severity levels tell your team what to fix first.

Most teams use a 4-level system (SEV-1 through SEV-4), though some use 3 or 5 levels. The exact number matters less than having clear definitions that everyone on your team agrees on.

The goal is simple: when someone on your team says "this is a SEV-2," everyone should understand exactly what that means — how many users are affected, how fast to respond, who gets paged, and how often to communicate.

The 4-Level Incident Severity Framework

Here's a severity level framework designed for SaaS teams. Adapt the specifics to your product, but keep the structure.

SEV-1 — Critical

Definition: Complete service outage or critical functionality unavailable for all or most users.

Examples:

Your application is completely down
API returning errors for all requests
Data loss or data corruption
Security breach with active exploitation
Payment processing failure

Response:

All hands on deck — drop everything
Page the on-call engineer immediately
Post a status page update within 5 minutes
Update customers every 15-30 minutes until resolved
Executive notification required

SLA target: Acknowledge within 5 minutes. Resolve or mitigate within 1 hour.

SEV-2 — High

Definition: Major feature degraded or unavailable. A significant portion of users are impacted, but the service is not completely down.

Examples:

Dashboard loading but showing stale data
Email notifications not sending
Login working but extremely slow (>10s response times)
A critical integration (Slack, PagerDuty) is broken
Monitoring checks failing intermittently

Response:

On-call engineer begins work immediately
Post a status page update within 15 minutes
Update customers every 30-60 minutes
May escalate to SEV-1 if impact grows

SLA target: Acknowledge within 15 minutes. Resolve within 4 hours.

SEV-3 — Medium

Definition: Minor feature impaired or a workaround exists. A small subset of users is affected.

Examples:

A single monitoring check returning false positives
CSV export timing out for large datasets
UI rendering issue in one browser
Non-critical API endpoint responding slowly
Scheduled report delayed by 30+ minutes

Response:

Addressed during business hours
No status page update unless customers report it
Fix in next deployment cycle or expedited if worsening

SLA target: Acknowledge within 2 hours. Resolve within 1 business day.

SEV-4 — Low

Definition: Cosmetic issue, minor inconvenience, or improvement request. No meaningful user impact.

Examples:

Typo in the UI
Tooltip displaying incorrect text
Minor CSS alignment issue
Documentation outdated
Feature request disguised as a bug report

Response:

Added to backlog
Fixed when convenient or as part of planned work
No status page update needed

SLA target: No SLA. Fix at team discretion.

Incident Severity Matrix (Copy and Use)

Use this matrix as a starting point. Print it, pin it in Slack, or add it to your runbook.

Level	Name	User Impact	Response Time	Update Cadence	Status Page?	Who's Involved
SEV-1	Critical	All / most users	5 minutes	Every 15-30 min	Yes — immediately	All engineers + leadership
SEV-2	High	Large subset	15 minutes	Every 30-60 min	Yes — within 15 min	On-call + relevant team
SEV-3	Medium	Small subset	2 hours	As needed	Only if reported	Assigned engineer
SEV-4	Low	Minimal / none	Next business day	None	No	Backlog

How to Classify Incidents: The Impact-Urgency Model

When an alert fires or a customer reports an issue, your team needs to assign a severity fast — often within the first 2 minutes. Use two dimensions:

Impact — How many users are affected and how badly?

All users, core functionality broken → High impact
Subset of users, major feature degraded → Medium impact
Few users, workaround available → Low impact

Urgency — Is the situation getting worse?

Revenue loss, data at risk, or security exposure → High urgency
Degrading but stable → Medium urgency
Stable, not worsening → Low urgency

	High Urgency	Medium Urgency	Low Urgency
High Impact	SEV-1	SEV-1	SEV-2
Medium Impact	SEV-2	SEV-2	SEV-3
Low Impact	SEV-2	SEV-3	SEV-4

Rule of thumb: When in doubt, classify higher. You can always downgrade a SEV-1 to a SEV-2 as you learn more. You can't un-ignore a critical incident.

Severity Levels vs. Priority Levels

These terms get confused constantly. They're related but different:

Severity measures impact — how bad is this for users right now?
Priority measures order — when should we fix this relative to other work?

A SEV-4 cosmetic bug on your pricing page might get P1 priority because it's costing conversions. A SEV-2 bug affecting 5% of users on an obscure feature might get P3 priority because the workaround is simple.

Severity is set during the incident based on impact. Priority is set after the incident based on business judgment.

Implementing Severity Levels for Your Team

Step 1: Define Your Criteria

Take the framework above and customize it for your product. The key questions:

What does "all users affected" mean for your product? (If you have 50 customers, one enterprise customer being down might be SEV-1)
Which features are "critical"? (Your core value proposition — the thing customers pay for)
What constitutes data loss or security exposure in your context?

Write it down. Put it in your runbook or internal wiki. If it's not written down, it doesn't exist.

Step 2: Build Response Playbooks

For each severity level, document:

Who gets paged and through what channel
Response time expectation — how fast should someone acknowledge
Communication protocol — when to update the status page, who writes the update
Escalation triggers — when does a SEV-2 become a SEV-1

For detailed communication templates, see our guide on incident communication best practices.

Step 3: Integrate with Your Monitoring

Your monitoring tool should map alert conditions to severity levels automatically where possible:

HTTP 5xx error rate > 50% → SEV-1 alert
Response time > 5s for 5 minutes → SEV-2 alert
SSL certificate expiring in < 7 days → SEV-3 alert
Uptime check failure from single location → SEV-3 (could be false positive)
Uptime check failure from multiple locations → SEV-1

With StatusRay your monitoring and status page are in one tool — when monitoring detects an issue, you update your status page in one click. No context-switching between your monitoring dashboard and a separate status page tool.

Step 4: Practice and Iterate

Run tabletop exercises quarterly. Present a scenario ("Your database primary just went down, read replicas are serving stale data") and have the team classify it, assign roles, and walk through the response. This sounds like overkill for a 10-person team, but it takes 30 minutes and prevents confusion during real incidents.

After every SEV-1 and SEV-2 incident, review the severity classification in your post-mortem. Was it classified correctly? Did the response match the severity?

Common Severity Classification Mistakes

The "Everything Is SEV-1" Problem

When every issue gets classified as critical, nothing is critical. Your team gets alert fatigue, response quality drops, and actual SEV-1 incidents get slower responses because everyone is already tired.

Fix it: Track your severity distribution monthly. A healthy ratio looks roughly like:

Level	Expected % of Total Incidents
SEV-1	5-10%
SEV-2	15-25%
SEV-3	40-50%
SEV-4	20-30%

If more than 20% of your incidents are SEV-1, your definitions are too loose.

Ignoring Business Context

A technical issue that seems minor can have outsized business impact. Your checkout flow being slow during a product launch is not a SEV-3 just because the service is "technically up."

Always consider: who is affected, when it's happening, and what they're trying to do. A 30-second delay on your status page during a customer's outage investigation is more impactful than a 30-second delay on your blog at 3am.

Never Adjusting Severity

Severity isn't permanent. As you investigate, new information changes the picture:

You thought it was a minor issue but discover data corruption → Upgrade to SEV-1
You classified it as SEV-1 but found only 3 users are affected → Downgrade to SEV-3

Make it explicit when you change severity and communicate why.

Measuring the Effectiveness of Your Severity Framework

Track these metrics quarterly to know if your severity levels are working:

Metric	What It Tells You	Watch For
MTTR by severity level	Whether response matches severity	SEV-1 MTTR > 1 hour
Severity distribution	Whether definitions are calibrated	> 20% SEV-1 incidents
Reclassification rate	How often initial classification changes	> 30% reclassification
Time to classify	Whether criteria are clear enough	> 5 minutes to assign severity
Customer complaints vs severity	Whether your levels match customer perception	Complaints on incidents classified SEV-3/4

FAQ

How many severity levels should we use? Four is the sweet spot for most SaaS teams. Three levels lack nuance (you'll argue about borderline cases). Five or more add complexity without improving decisions. Start with four and only add more if you have a clear need.

Who decides the severity level? The first responder assigns an initial severity based on available information. Anyone can escalate the severity at any time. Only the incident commander (or equivalent) should downgrade severity.

How often should we review our severity definitions? Quarterly at minimum, and after every major incident. As your product grows and your customer base changes, what counts as "critical" evolves too.

Should different services have different severity levels? Yes, if the services have different user impact. Your API being slow is probably more severe than your admin dashboard being slow. Document service-specific criteria in your runbook so there's no ambiguity.

What's the difference between severity and priority? Severity measures current impact (how bad is it for users). Priority measures importance relative to other work (when should we fix it). A low-severity bug can have high priority if it affects a key customer or revenue. They're related but not the same.

How do incident severity levels relate to SLAs? Your severity levels should map directly to your SLA response and resolution commitments. If your SLA promises 99.9% uptime, any incident causing downtime is at minimum a SEV-2 because it's consuming your error budget.

Start Classifying Incidents with Confidence

A clear severity framework eliminates the "how bad is this?" debate during incidents. Your team responds faster, communicates consistently, and resolves issues with less chaos.

The framework in this guide works for most SaaS teams out of the box. Copy the severity matrix, customize the examples for your product, and put it where your team can find it during an incident.

And when incidents happen, make sure your customers know about it. StatusRay gives you a professional status page with built-in monitoring — so you detect issues automatically and communicate them in one click.

Create your status page — free →

Related reading: