【正文】
ity Risks and Countermeasures All risks to the availability of each IT ponent need to be considered and appropriate countermeasures designed to mitigate them. The nature of the availability risks that an IT ponent faces varies according to the MOF IT domain the ponent resides in. Examples of availability risks by IT domain are: ? Application, middleware, and operating system domains ? Single point of failure ? Incorrect configuration option ? Design flaw ? Poor development methodology ? Coding error ? Hardware and work domains ? Single point of failure ? Out of date firmware ? Poor documentation ? Vendor support quality ? Lack of antistatic precautions ? Lack of spares ? Poorly labeled cabling ? Facilities domain ? Insufficient airconditioning capacity ? Power outages ? Power surges and spikes ? Fire and flood ? Physical security ? Egress domain ? Single power feed from utility ? Single munications feed from Telco ? Personnel ? Poor quality procedures 14 A vailability Management ? Lack of discipline ? Lack of skills Availability management and IT service continuity management are closely related. Both processes strive to eliminate risks to the availability of IT services and employ the use of countermeasures to achieve this. The prime focus of availability management is in handling the routine risks to availability that can be reasonably expected to occur on a daytoday basis. IT service continuity management caters to more extreme and relatively rare availability risks, such as fire and flood, and also acts as a catchall for any unanticipated availability risks. Service level management affects both IT service continuity management and availability management. Service level management takes primary responsibility for interacting with customers and determining which IT services are most crucial to the survival of the pany, and which alternate means of conducting business are employed if they fail for a prolonged period. Availability management draws on this prioritization work and takes it a stage further by identifying the key IT infrastructure ponents that support these critical services and determining whether they contain any single points of failure or other risks to availability that can be costeffectively addressed through the use of appropriate countermeasures. Where no straightforward countermeasures are available or where the countermeasure is prohibitively expensive or beyond the scope of a single IT service to justify in its own right, then these availability risks are passed to the IT service continuity management SMF to handle. Within each IT domain, there are specific risks to availability that are considered too unlikely to justify the cost of mitigating them, or there are risks that were not anticipated. For example, few data centers anticipate a meteor shower on the building. Of those that do, few spend the money to install antimeteor shielding. In these situations, IT service continuity management outlines what must be done to restore service. There need not be a separate IT service continuity plan for each risk. One plan can cover the risks of flood, fire, meteors, terrorist attack, and any other eventuality that might disable a plete data center. There always needs to be an IT service continuity plan in place, even where there is also an availability plan in place to handle more routine issues. As with an earlier example, an individual power supply within a server can be expected to fail at some stage. A very effective and inexpensive countermeasure that can be employed by availability management is the adoption of server technology incorporating hotplug redundant power supplies. This technology allows a second power supply to seamlessly take over from the failing unit and for the failed unit to be replaced online without any interruption to the IT service. IT service continuity management needs to plan for situations where both power supplies fail at the same time, or where the second fails while the first is being repaired. This is a much more unlikely scenario, but must be planned for. Service Management Function 15 The following figure summarizes the relationship between availability management and IT service continuity management with regards to the identification and mitigation of availability risks. IT S ervi c e C onti nui ty Managem entA v ai l abi l i ty Managem entIdenti fy crit i c al I T serv i c esIden ti fy c omm oni nfras truc ture ri s k s c omm on t o c ri ti c al I T servi c esIdenti fy k ey c us tom er func ti ons w i thi n e ac h c ri ti c al I T serv i c eIdent i fy k ey I T c o mponent sIdenti fy avai l abi l i ty ri s k s an d s i ngl e poi nts of f ai l ureCrea te c onti ngency pl anDes i gn and i mpl e ment c ounte rmeas ureInciden t occu rsIn v ok e c onti ngency pl anB us i nes s as us ualYESNONOYESIdenti fy m i ni mum b us i nes s requi rement sS ervi ce Level M anag em entIs f ai l ure ex pec ted a nd t he c ounterm eas ure aff ordabl e ?Is t here a c ounte rmeas ure that w orked ?W a s s ervi c e rest ored i nagreed ti me fram e ? NOYES Figure 2. Relationship between Availability Management and IT Service Continuity Management SMFs When an availability risk is identified and confirmed as falling within the scope of availability management, the next step is to identify an appropriate countermeasure that can be deployed to minimize the exposure to the IT service. 16 A vailability Management It is important to ensure that any countermeasures employed are affordable and can be costjustified in relation to the cost of downtime that has already been agreed upon with the customer. Availability management strives to provide an optimum level of availability