Introducing Availability Management to small businesses
So what is availability management? Well in previous blogs we have talked about SLA’s or “Service Level Agreements”, documents that are agreed with the end users that outline the level of service they should expect. Well those numbers have got to come from some where and it is the role of the Availability Manager to define what that should be. But where does it come from ?
How can someone say “This service should be available 99.25% of the time”?
Well the first part is down to system design and review. When designing a system the following aspects should be taken into consideration:
Availability – How the system is designed to keep it available (e.g. can it be remotely managed by the support teams, does it have two servers which are clustered)
Reliability – How much resilience is there in the components (e.g. does it run RAID 5 on the disks, does it have due power suppliers from different feeds etc.)
Maintainability – How old are the components so if they fail, new ones can be purchased easily. Also is the service easy to access ?
Serviceability - The type of support provided by the suppliers / support contracts
Security – The confidentiality, integrity and availability of the data
Secondly comes historical information. ITIL talks about the “expanded incident life cycle”. This is about understanding when a service fail’s, how the different elements of the downtime is made up. Either way, their will be a history of uptime that can be mapped against the service to give a true representation of what is expected. Prior to this, the Availability manager would have been responsible for reviewing the system design to ensure the correct levels of Availability have been built into the system. As always this is trade off between cost and service (i.e. how much you want to spend to get to the level of availability you perceive to be important).
Finally, the availability manager is the custodian of the “Availability Plan”. This document list’s all of the possible improvements that are required (normally proposed as a result of an outage and specified by the problem manager) to improve system availability along with the benefit and the indicative cost. The next article on this subject will try to put this in context of a small to medium business….
