BLOG! Goitil.co.uk Providing Affordable IT Management to SME's

Blog Navigation:
Access Management (SO)
Applications Management (SO)
Availability Management (SD)
Capacity Management (SD)
Change Management (ST)
Continual Service Improvement (CSI)
Demand Management (SS)
Evaluation Management (ST)
Event Managment (SO)
Financial Management (SS)
Incident Management (SO)
Information Security Management (SD)
IT Service Continuity Management (SD)
IT Operations Management(SO)
Problem Management (SO)
Release and Deployment Management (ST)
Request Fulfilment (SO)
Risk Management (SS)
Service Asset and Configuration Management (ST)
Service Catalogue Management (SD)
Service Desk (SO)
Service Knowledge Management (ST)
Service Level Management (SD)
Service Portfolio Management (SS)
Service Validation and Testing Management (ST)
Strategy Generation (SS)
Supplier Management (SD)
Technical Management (SO)
Transition Planning and Support (ST)
Return to blog homepage

The expanded incident lifecycle

So you happen to have a system fail that caused your business an amount of disruption or maybe some loss of sales / profit. It could be that you were fortunate and had a support contract in place that allowed you to restore the service, or possibly no contract was in place and you had to source a supplier quickly? Once the service is restored we really need to work out the downtime.

Why ? Well for two main reasons:

1) You may have a support contract that has performance metrics in terms of response and fix times

2) We need to understand how the incident breaks down to look at areas where we could shorten the duration So how do we measure the downtime ?

Well every incident follows what ITIL refers to as “The expanded incident lifecycle”. Each incident will go through various stages with some key measurements being established (and used to reduce the downtime if future outages occur).

1) Incident – The point that the user notices the service failure

2) Detection - The point that the service failure is reported

3) Diagnosis – The point at which someone starts working on the issue

4) Repair – The point at which the cause of the incident has been fixed

5) Recovery – The point at which normal service is technically provided

6) Restoration – The point at which the users commence using the service

So how can we help ourselves shorten the downtime ?

Well these are the key measurements:

a) “Detection elapse time” is the difference between 2 & 1. This is how long it takes the user to report the fault once they have noticed it

b) “Response time” is the difference between 3 & 2. This is how long it takes whoever is fixing it to start working on the problem. For some contract’s this may be a contractual key performance indicator

c) “Repair time” is the difference between 5 & 3. This is how long it takes to get the service back into an operational state. Once again, this may be contractual

d) “Recovery time” is the difference between 6 & 5. This is how long it takes to get users back onto the system once it has been fixed

So really when looking at the time a service is down, you can analyse the “incident lifecycle” in two ways: A & D focus on your operation, as these are under your control. The more efficient you are at reporting the incident and communicating to your users that the service is back, the shorter your incident downtime B & C focus on the technical resource and how they work on your incident (including contractual obligations). In the next chapter of this topic we will drill down into these points in a little more detail

3rd Nov 2009

Click here for details of our FREE business healthcheck and join the rest of the companies using IT Service Management.

Co Reg no 5808734 | ©2006 Nuts and May Ltd