Service Delivery & SLA Management
Defines service tiers, SLA targets, escalation procedures, reporting cadences, and AI-powered monitoring thresholds for all managed service clients.
Service Tiers
ASI AI Solutions offers three managed service tiers. Each tier includes AI-powered monitoring via ASI AI Sentinel, with increasing levels of proactive support and dedicated resources.
- Response Time: 15 minutes (P1)
- Support Hours: 24/7/365
- Dedicated SDM: Yes, named
- AI Monitoring: Full suite + predictive
- Service Reviews: Monthly
- On-site Support: Included (scheduled)
- Uptime SLA: 99.99%
- Proactive Optimisation: Quarterly
- Response Time: 1 hour (P1)
- Support Hours: 24/7 P1-P2, BH P3-P4
- Dedicated SDM: Shared (1:5 ratio)
- AI Monitoring: Full suite
- Service Reviews: Quarterly
- On-site Support: Chargeable
- Uptime SLA: 99.95%
- Proactive Optimisation: Bi-annual
- Response Time: 4 hours (P1)
- Support Hours: Business Hours (7am-7pm AEST)
- Dedicated SDM: Shared (1:10 ratio)
- AI Monitoring: Core monitoring
- Service Reviews: Quarterly
- On-site Support: Chargeable
- Uptime SLA: 99.9%
- Proactive Optimisation: Annual
Incident Priority Matrix
Incidents are classified using a combination of Impact (number of users/business functions affected) and Urgency (time sensitivity) to determine priority.
| Priority | Definition | Example | Platinum Response | Gold Response | Silver Response | Resolution Target |
|---|---|---|---|---|---|---|
| P1 - Critical | Complete service outage or critical business function unavailable. Affects all/most users. | Email system down, ERP unavailable, site-wide network outage | 15 min | 1 hr | 4 hr | 4 hours |
| P2 - High | Major degradation of a key service. Significant number of users impacted. Workaround may exist. | Shared drive slow, VPN dropping intermittently, backup failures | 30 min | 2 hr | 8 hr | 8 hours |
| P3 - Medium | Minor service impact. Single user or small group affected. Workaround available. | Single user can't print, Outlook add-in not loading, password reset | 1 hr | 4 hr | 1 BD | 24 hours |
| P4 - Low | Informational or cosmetic issue. No operational impact. | Feature request, how-to question, non-urgent change | 4 hr | 1 BD | 2 BD | 5 business days |
BD = Business Day (Mon-Fri, 07:00-19:00 AEST, excl. Australian public holidays)
Impact × Urgency Matrix
| High Urgency | Medium Urgency | Low Urgency | |
|---|---|---|---|
| High Impact | P1 | P2 | P3 |
| Medium Impact | P2 | P3 | P4 |
| Low Impact | P3 | P4 | P4 |
Escalation Procedures
Functional Escalation
When the current support tier cannot resolve an incident within the allocated time, it is escalated to the next tier.
0-15 min
15-60 min
1-4 hrs
4+ hrs
Hierarchical Escalation
When management attention or authority is needed (e.g., resource allocation, client communication for major incidents).
| Time Elapsed | Escalation To | Action Required |
|---|---|---|
| P1 at 30 min | Service Delivery Manager | Notified and monitors. Ensures resources assigned. |
| P1 at 1 hr | Head of Service Delivery | Briefed. Authorises additional resources. Client executive notified. |
| P1 at 2 hrs | VP of Operations | Briefed. May invoke Major Incident Process. Executive bridge call. |
| P1 at 4 hrs | CEO | Briefed if business-critical client or reputational risk. |
| P2 at 4 hrs | Service Delivery Manager | Notified and ensures resolution path. |
| P2 at 8 hrs | Head of Service Delivery | Briefed. Additional resources allocated. |
SLA Reporting & Review
Monthly SLA Report Contents
- Executive summary of service performance
- SLA compliance by priority level (response time and resolution time)
- Total incident count and trend analysis
- Top 10 incident categories
- First-contact resolution rate
- Customer satisfaction score (CSAT)
- AI Sentinel proactive detection rate (incidents caught before user impact)
- Service availability percentage vs SLA target
- Pending changes and upcoming maintenance
- Recommendations and improvement actions
Review Cadence
| Review Type | Frequency | Attendees | Focus |
|---|---|---|---|
| Operational Review | Weekly | SDM, Team Leads | Ticket queue, SLA at risk, resource utilisation |
| Monthly Service Review | Monthly | SDM, Client Primary Contact | SLA performance, incidents, upcoming changes |
| Quarterly Business Review (QBR) | Quarterly | Head of SD, Client Exec Sponsor, SDM | Strategic alignment, service improvement, roadmap |
| Annual Service Review | Annually | VP Ops, Client CIO/CTO | Contract review, strategic planning, innovation |
Service Improvement Plans
A Service Improvement Plan (SIP) is initiated when:
- SLA compliance falls below 95% for any priority level in a calendar month
- Customer satisfaction (CSAT) drops below 4.0/5.0 for two consecutive months
- A Major Incident PIR identifies systemic process failures
- Client formally requests a SIP through their SDM
SIP Structure
- Problem Statement: Clear description of the service gap
- Root Cause Analysis: 5-Whys or fishbone analysis
- Improvement Actions: Specific, measurable actions with owners and due dates
- Success Criteria: How we will know the improvement worked
- Review Schedule: Weekly progress reviews until targets met
- Closure: Formal closure when success criteria achieved for 30 consecutive days
Customer Satisfaction Measurement
CSAT (Per-Ticket)
Sent after every resolved ticket via automated ServiceNow survey.
- Scale: 1-5 stars
- Target: ≥ 4.5/5.0 average
- Response rate target: ≥ 30%
- Scores of 1-2 trigger automatic SDM follow-up within 4 hours
- Results reviewed weekly in operational review
NPS (Net Promoter Score)
Sent quarterly to all client stakeholders via email survey.
- Scale: 0-10 ("How likely are you to recommend ASI?")
- Target: NPS ≥ 50
- Detractors (0-6) receive personal follow-up from Head of SD
- Results presented at QBR and internal leadership meeting
- Trend analysis tracks improvements quarter-over-quarter
Template: Monthly Service Review Agenda
📄 Monthly Service Review Meeting Agenda
AI-Powered Monitoring & Alerting
ASI AI Sentinel is our proprietary AI monitoring platform deployed to all managed clients. It provides real-time infrastructure monitoring with machine learning-driven anomaly detection.
Monitoring Coverage
| Category | Metrics Monitored | Alert Threshold | AI Enhancement |
|---|---|---|---|
| Compute | CPU utilisation, memory usage, process count | CPU > 85% for 5 min, Memory > 90% | Predictive scaling recommendations based on historical patterns |
| Storage | Disk usage, IOPS, latency, disk health (SMART) | Disk > 85%, IOPS latency > 20ms | Capacity forecasting with 30/60/90 day projections |
| Network | Bandwidth, packet loss, latency, interface errors | Packet loss > 1%, Latency > 100ms | Anomaly detection on traffic patterns for DDoS/exfiltration |
| Application | Response time, error rate, availability, transaction volume | Error rate > 5%, Response > 3s | Correlation of application errors with infrastructure events |
| Security | Failed logins, privilege escalation, file integrity, SIEM events | Per security policy rules | Behavioral analysis for insider threat detection |
| Cloud | Azure/AWS/GCP resource health, cost anomalies, config drift | Cost > 20% above forecast, config drift detected | Cost optimisation recommendations, right-sizing suggestions |
| Backup | Backup success/failure, backup duration, RPO compliance | Any backup failure, RPO breach | Predictive backup window optimisation |