Scope: This process covers IT service incidents for managed clients. For security incidents (data breach, ransomware, etc.), refer to the Security Incident Response Plan.

Incident Lifecycle

TriggerIncident Detected
Step 1Logging & Categorisation
Step 2Prioritisation (Impact × Urgency)
Step 3Initial Diagnosis (AI-Assisted)
DecisionCan L1/AI Resolve?
Step 4Escalation (if needed)
Step 5Investigation & Diagnosis
Step 6Resolution & Recovery
Step 7Closure & Documentation
CompleteIncident Closed

Incident Detection

Incidents enter the process through three primary channels:

AI Monitoring (Automated)
ASI AI Sentinel detects anomalies, threshold breaches, and infrastructure failures. Automated alerts generate ServiceNow tickets with pre-populated diagnostics. AI auto-remediation resolves approximately 40% of detected issues without human intervention.
User Report
End users report incidents via the self-service portal, email (support@asiaisolutions.com.au), phone (1300-ASI-HELP), or Microsoft Teams bot. The AI chatbot attempts to resolve common issues (password resets, how-to queries) before creating a ticket.
Proactive Detection
Scheduled health checks, vulnerability scans, and predictive analytics identify potential incidents before they cause impact. These are logged as proactive incidents and addressed during maintenance windows where possible.

Classification & Prioritisation

Incident Categories

CategorySub-CategoriesTypical Priority
NetworkConnectivity, DNS, DHCP, firewall, SD-WAN, Wi-FiP1-P3
Server/ComputeServer down, high utilisation, OS crash, service failureP1-P3
Cloud ServicesAzure/AWS outage, resource failures, config issuesP1-P3
Email & CollaborationM365 issues, Exchange, Teams, SharePointP1-P3
ApplicationLOB app errors, performance, access issuesP2-P4
Desktop/EndpointHardware failure, OS issues, software installationP3-P4
SecurityMalware detected, phishing, unauthorised accessP1-P2 (escalate to CSIRT)
Backup/DRBackup failure, restore request, DR failoverP2-P3
Print/PeripheralPrinter issues, scanner, AV equipmentP3-P4
Access/IdentityPassword reset, account lockout, MFA, permissionsP3-P4

AI-Assisted Triage

ASI AI Sentinel provides automated initial diagnosis for all incoming incidents:

  • Natural language classification: AI reads user-submitted descriptions and auto-categorises the incident with 92% accuracy
  • Suggested priority: AI recommends priority based on historical patterns, affected services, and user role (VIP flagging)
  • Knowledge match: AI searches the knowledge base and attaches the top 3 relevant articles to the ticket
  • Similar incident detection: AI identifies if the issue matches an active P1/P2 incident (potential child incident) and links them
  • Auto-remediation check: AI determines if the incident can be resolved automatically (e.g., service restart, DNS flush, certificate renewal) and executes if confidence > 95%

Escalation Matrix

L1: AI Auto-Resolve
Automated remediation
Resolution: 0-5 min
L2: Service Desk Engineer
General IT support
Resolution: 5 min - 4 hrs
L3: Specialist Engineer
Domain expert (Cloud, Security, Network)
Resolution: 1-8 hrs
L4: Vendor / Architect
Vendor support or Solutions Architect
Resolution: as per vendor SLA

Escalation Triggers

FromToTriggerMax Time Before Escalation
L1 AIL2 Service DeskAuto-remediation failed or confidence < 95%5 minutes
L2 Service DeskL3 SpecialistUnable to resolve within timeframe, requires domain expertiseP1: 15 min, P2: 30 min, P3: 2 hrs
L3 SpecialistL4 VendorVendor product defect, requires vendor interventionP1: 1 hr, P2: 2 hrs
Any LevelMajor Incident ManagerP1 incident or multiple related P2 incidentsImmediately upon identification

Resolution & Recovery

  1. Identify resolution: Apply fix (workaround or permanent) based on diagnosis
  2. Implement fix: Execute the resolution through the appropriate change process (emergency change for P1 in production)
  3. Validate resolution: Confirm the service is restored and functioning correctly
  4. User confirmation: Contact the affected user/s to confirm the issue is resolved
  5. Document resolution: Record the root cause, resolution steps, and any knowledge article updates in ServiceNow
  6. Close incident: Set status to "Resolved". Auto-closes after 3 business days if no user objection.

Major Incident Process

A Major Incident is declared when a P1 incident affects multiple clients, causes complete loss of a critical business service, or has potential reputational/financial impact exceeding $50,000.
1

Declaration (0-5 min)

Any L2+ engineer or SDM can declare a Major Incident. The Major Incident Manager (MIM) is paged immediately via PagerDuty. A bridge call/Teams channel is opened.

2

Assembly (5-15 min)

MIM assembles the response team: relevant L3 specialists, Solutions Architect (if needed), and communications lead. All parties join the bridge call.

3

Assessment & Communication (15-30 min)

MIM assesses impact, confirms priority, and issues the first client notification. Internal status page updated. All related tickets linked as child incidents.

4

Resolution Work (Ongoing)

Technical team works to resolve. MIM provides updates every 30 minutes to affected clients and internal stakeholders. Vendor engaged if needed.

5

Resolution & Confirmation

Service restored. MIM confirms with all affected parties. Resolution notification sent. Monitoring elevated for 24 hours post-resolution.

6

Post-Incident Review (within 5 BD)

PIR conducted within 5 business days. Report distributed to all stakeholders. Improvement actions tracked to completion.

Post-Incident Review (PIR) Template

📄 Post-Incident Review Report

Incident ID
[INC-YYYY-NNNNNN]
Incident Title
[Brief description of the incident]
Priority / Severity
[P1/P2] / [Major Incident: Yes/No]
Affected Client(s)
[List all affected clients and number of impacted users]
Timeline
Detection: [DD/MM/YYYY HH:MM]
First Response: [DD/MM/YYYY HH:MM] (TTA: XX min)
Major Incident Declared: [DD/MM/YYYY HH:MM] (if applicable)
Resolution: [DD/MM/YYYY HH:MM] (TTR: XX hrs XX min)
Full Recovery: [DD/MM/YYYY HH:MM]
Impact Summary
[What was affected, how many users/services impacted, business impact estimate ($)]
Root Cause
[Detailed root cause analysis. Use 5-Whys technique.]
Resolution Steps
[Step-by-step actions taken to resolve the incident]
What Went Well
[Positive aspects of the response]
What Could Be Improved
[Areas for improvement in detection, response, or communication]
Improvement Actions
1. [Action] - Owner: [Name] - Due: [Date]
2. [Action] - Owner: [Name] - Due: [Date]
3. [Action] - Owner: [Name] - Due: [Date]
PIR Author
[Name, role]
Reviewed By
[Head of Service Delivery sign-off]

Communication Templates

Customer Notification — Incident Acknowledged

Subject: [P1/P2] Incident Notification — [Brief Description] — [INC-ID]

Dear [Client Name],

We are writing to inform you that we have identified an issue affecting [service/system name]. Our team is actively investigating.

Incident Details:

  • Incident ID: [INC-YYYY-NNNNNN]
  • Priority: [P1/P2]
  • Impact: [Description of what is affected]
  • Detected: [DD/MM/YYYY HH:MM AEST]

We will provide an update within [30 minutes for P1 / 1 hour for P2] or sooner if we have new information.

If you have any questions, please contact your Service Delivery Manager or call 1300-ASI-HELP.

Kind regards,
ASI AI Solutions Service Desk

Internal Escalation — Major Incident

Subject: MAJOR INCIDENT DECLARED — [Brief Description] — [INC-ID]

To: Major Incident Response Team, Head of Service Delivery, VP Operations

A Major Incident has been declared. Please join the bridge call immediately.

  • Bridge Call: [Teams meeting link]
  • Incident ID: [INC-YYYY-NNNNNN]
  • Impact: [X clients / Y users affected]
  • Service: [Affected service]
  • MIM: [Name]
  • Current Status: [Investigating / Containment]
Customer Notification — Incident Resolved

Subject: RESOLVED — [Brief Description] — [INC-ID]

Dear [Client Name],

We are pleased to confirm that the incident reported on [date] has been resolved.

  • Incident ID: [INC-YYYY-NNNNNN]
  • Resolution Time: [X hours Y minutes]
  • Root Cause: [Brief description]
  • Resolution: [Brief description of fix applied]
  • Preventive Actions: [What we are doing to prevent recurrence]

We are continuing to monitor the affected services closely. A full Post-Incident Review will be shared within 5 business days.

We apologise for any disruption caused. Please don't hesitate to reach out if you experience any further issues.

Kind regards,
[Service Delivery Manager Name]
ASI AI Solutions

Incident Management Metrics

8 min
Mean Time to Acknowledge (MTTA)
2.4 hrs
Mean Time to Resolve (MTTR)
72%
First Contact Resolution
1,842
Monthly Incident Volume
38%
AI Auto-Resolved
2.1
Major Incidents / Month (avg)

Metric Definitions & Targets

MetricDefinitionTargetCurrent (Feb 2026)
MTTATime from incident creation to first human response≤ 10 min (P1)8 min
MTTRTime from incident creation to resolution≤ 4 hrs (P1)2.4 hrs
First Contact Resolution% of incidents resolved at first interaction without escalation≥ 70%72%
SLA Compliance% of incidents meeting response and resolution SLA targets≥ 95%96.2%
Reopened Rate% of resolved incidents reopened within 7 days≤ 5%3.8%
AI Auto-Resolution Rate% of incidents resolved by AI without human intervention≥ 40%38%
Customer SatisfactionPost-resolution CSAT survey average≥ 4.5/5.04.6/5.0