Use case

Incident triage, classification, and RCA.

From the field, AI native workflow redesign of incident triage and rca process within ITSM IT function.

Get the playbook
Convolving expertise

A senior Convolving delivery team partnered with the IT operations function for one sprint. Operators from our expert network – with forty combined years inside enterprise SRE, ITSM, and incident response – reviewed the redesign at each checkpoint. Forward-deployed engineers built inside the team's observability, ticketing, and CMDB stack. One flat fee, artifact out, no retainer creep.

Situation

Today incidents arrive as raw alerts. The on-call engineer triages, classifies, finds the runbook, and starts the RCA from scratch.

Observability lives in Datadog, Splunk, and CloudWatch. Tickets live in ServiceNow or Jira SM. Runbooks live in Confluence. The engineer stitches all four under deadline pressure. Novant Health automated sixty-three percent of incidents and cut MTTR roughly thirty percent over eighty-seven thousand predictions in four months; the legacy stack does not get there because the signal does not flow.

Auto-classification <10% On legacy ITSM stacks
MTTR Baseline Engineer-led triage and RCA
Runbook hit rate Variable Depends on engineer attention
RCA completeness Sampled Major incidents only

Click any node to see the activities and tools behind it. Open the canvas in fullscreen for the horizontal view.

Complication

Largest obstacles and inefficiencies.

Triage eats the first ten minutes.

Engineers stitch alert, CMDB, and recent changes by hand under page pressure. MTTR pays the tax.

Runbooks lag the system.

Confluence runbooks describe last quarter's architecture. Engineers learn that mid-incident.

RCAs file but rarely teach.

RCA writeups land late, single-author, and rarely get cross-referenced into runbooks or alerting rules.

Resolution

The AI-native cycle.

Same five steps. Click any node to see what the redesign does in that step.

Auto-classification 60–90% ▲ from <10% vs today
MTTR ▼ ~30% Novant-equivalent band
Runbook hit rate Uniform Right runbook, every page
RCA completeness Every incident From sampled to corpus-wide
Key changes

What the redesign actually shifts.

Cycle compression

  • Auto-classification moves from under ten percent toward sixty to ninety.
  • MTTR drops roughly thirty percent in the Novant Health band.
  • Engineers pick up tickets with hypothesis and runbook attached.

Runbook discipline

  • Live runbooks surface against current state, not last quarter's.
  • Drafted RCAs feed back into runbooks and alert rules.
  • Tribal knowledge stops gating response.

Engineer capacity

  • Triage stops eating the first ten minutes of every page.
  • Repeat noise suppresses after disposition.
  • Senior engineers concentrate on novel incidents and design.

Audit and control

  • Every classification logs the rule and the data line.
  • Every RCA logs the model version and reviewer override.
  • Service owners read the same trail as audit.

Deploy this in your team.

The redesign above ships as a step-by-step playbook. Alert enrichment spec, classification rule library, runbook ingestion pipeline, RCA prompt library, and the rollout cadence we use on engagements.