Use case

Incident triage, classification, and RCA.

From the field, AI native workflow redesign of incident triage and rca process within ITSM IT function.

Convolving expertise

A senior Convolving delivery team partnered with the IT operations function for one sprint. Operators from our expert network – with forty combined years inside enterprise SRE, ITSM, and incident response – reviewed the redesign at each checkpoint. Forward-deployed engineers built inside the team's observability, ticketing, and CMDB stack. One flat fee, artifact out, no retainer creep.

Situation

Today incidents arrive as raw alerts. The on-call engineer triages, classifies, finds the runbook, and starts the RCA from scratch.

Observability lives in Datadog, Splunk, and CloudWatch. Tickets live in ServiceNow or Jira SM. Runbooks live in Confluence. The engineer stitches all four under deadline pressure. Novant Health automated sixty-three percent of incidents and cut MTTR roughly thirty percent over eighty-seven thousand predictions in four months; the legacy stack does not get there because the signal does not flow.

Auto-classification <10% On legacy ITSM stacks

MTTR Baseline Engineer-led triage and RCA

Runbook hit rate Variable Depends on engineer attention

RCA completeness Sampled Major incidents only

Click any node to see the activities and tools behind it. Open the canvas in fullscreen for the horizontal view.

Complication

Largest obstacles and inefficiencies.

Triage eats the first ten minutes.

Engineers stitch alert, CMDB, and recent changes by hand under page pressure. MTTR pays the tax.

Runbooks lag the system.

Confluence runbooks describe last quarter's architecture. Engineers learn that mid-incident.

RCAs file but rarely teach.

RCA writeups land late, single-author, and rarely get cross-referenced into runbooks or alerting rules.

Resolution

The AI-native cycle.

Same five steps. Click any node to see what the redesign does in that step.

Auto-classification 60–90% ▲ from <10% vs today

MTTR ▼ ~30% Novant-equivalent band

Runbook hit rate Uniform Right runbook, every page

RCA completeness Every incident From sampled to corpus-wide

Key changes

What the redesign actually shifts.

Cycle compression

Auto-classification moves from under ten percent toward sixty to ninety.
MTTR drops roughly thirty percent in the Novant Health band.
Engineers pick up tickets with hypothesis and runbook attached.

Runbook discipline

Live runbooks surface against current state, not last quarter's.
Drafted RCAs feed back into runbooks and alert rules.
Tribal knowledge stops gating response.

Engineer capacity

Triage stops eating the first ten minutes of every page.
Repeat noise suppresses after disposition.
Senior engineers concentrate on novel incidents and design.

Audit and control

Every classification logs the rule and the data line.
Every RCA logs the model version and reviewer override.
Service owners read the same trail as audit.

Deploy this in your team.

The redesign above ships as a step-by-step playbook. Alert enrichment spec, classification rule library, runbook ingestion pipeline, RCA prompt library, and the rollout cadence we use on engagements.

Get the playbook Or book a coffee