Call it what you may – business continuity planning (BCP) or business continuity management (BCM) – it is about identifying those critical parts of an organisation that cannot afford to suffer a loss, viz., data and information. Critical operational services of any organisation, whether banking, financial services or government departments, are totally dependent on efficient and uninterrupted IT automation and IT systems. Their data protection infrastructure and recovery systems need to be up-to-date and state-of-the-art. Continuity in providing essential customer services powered by IT applications has to be ensured at all times.
This does not mean that business interruptions will not happen. How much they affect the business is the question one needs to answer. Therefore, BCP is a way to prevent and, if possible, manage the consequences of a disaster. Disaster Recovery Management (DRM) then becomes an integral part of the BCP that an organisation adopts.
A 2008 Symantec India Disaster Recovery survey focused on India revealed that only 22 per cent of Indian enterprises conduct full-scenario disaster recovery (DR) tests once a year or less because of perceived fear of business disruption and lack of resources to conduct tests. Reasons cited include lack of staff availability (56 per cent), disruption to employees (58 per cent), budgetary issues (44 per cent) and disruption to customers (46 per cent). In addition, 32 per cent fear DR testing could affect sales and revenue. Additionally, Gartner in its BCP/DR Hype Cycle report for 2011 observes that DR testing is very labour intensive and can become a significant barrier to scaling a DR programme.
Downtime is on the rise globally, according to Symantec’s 2012 State of the Data Center report, which reveals that 70 per cent of organisations experience downtime from power failure (11.3 hours per outage), while 63 per cent endure cyber attacks (52.7 hours per outage). Downtime per outage in the last 12 months was 5 hours, and organisations experienced an average of 4 downtime incidents during the same period. Symantec also finds 79 per cent of organisations report increasing complexity in data centres. As per business intelligence research experts of Aberdeen Group, worldwide the cost per hour of downtime has increased 65 per cent – from $100,000 in 2010 to $165,000 in 2012.
Surveys reveal that more than 90 per cent of outages are not due to natural disasters. Rather, they are mostly related to power outages (to parts of the data centres), operator errors and change management issues. These outages are more localised and do not cause a site level failure but instead cause disruption to few systems or applications. Along with increased frequency of outages, it is also expected that cyber attacks will be one of the fastest growing risks that businesses will have to deal with. All these issues can be effectively addressed through DR solutions and it is becoming a key requirement for all businesses.
For BCP and DR to really work, management has to be involved, for it is more about strategy and business processes than mere technology solutions. Second, the organisation has to set its recovery goals, which must be agreed upon across the board; this must be driven by management. The IT department has to design its key matrix of solutions around these goals – from a technology perspective, the recovery point and the recovery time. Building a culture of testing and readiness is integral to this. It is important to have automated IT recovery rather than rely on having the right person doing the right thing. This enables organisation-wide DR that ensures predictable recovery, saves costs, reduces IT down time, and reduces errors and dependence on experts being available when required.
BCP and DR really work when management is involved – it is more about strategy and business processes than mere technology solutions
The recommended approach to get the most out of a DR investment should be to look at the entire process as a lifecycle rather than a point solution. A lifecycle DR solutions approach with automation is the only way to ensure a reliable and scalable DR programme. The DR lifecycle approach leads to customers enjoying higher operational efficiencies, reduced outages and adopting industry best practices. Elements of an effective BCP include initiation; risk assessment; business impact analysis (recovery objectives and recovery requirements); strategy for prevention, response, resumption, recovery and restoration; and goals definition quantified in terms of recovery time objective (RTO) and recovery point objective (RPO). Data replication and DRM software and solutions enable all of this. Organisations can deploy DR solutions that can handle component failures as well as site level failures.
As e-governance and e-services increase and deepen in India, huge investments need to be made in core IT infrastructure, hardware, software and human resources. Soon, every state will have its own data centre. Ensuring uninterrupted availability of information for delivery of citizen services will become critical. The National Informatics Centre (NIC) has set up three national data centres at Delhi, Pune and Hyderabad for disaster recovery (a fourth centre is coming up in Bhubaneswar). Choice of replication and management of technology, scalability and bandwidth are some of the challenges which need to be addressed when setting up these centres. Cloud services offer some answers to these challenges. The Department of Electronics and IT (DeiTY) is preparing DR and strategy manuals. It is now pushing states to prepare their own DR plans. Some progressive states like Karnataka, Gujarat and Rajasthan already have them.
Financial institutions in India are mandated by regulatory authorities to have a business continuity and DR plan for their critical business processes. Besides meeting their regulatory obligations, financial institutions also have to reinforce their service commitment to their customers by demonstrating transparency, reliability and trust. Risk mitigation through DR planning should be a visible step for every financial institution.
The banking industry is at the forefront of using IT and also an aggressive adopter of DR capabilities. Core banking enables a bank to offer its customers uninterrupted access to banking services anywhere across the globe. No bank – big or small – would want to lose customers. Driven by business, regulatory and customer satisfaction goals, banks in India today are adopting the culture of IT recovery readiness. This is best exemplified by banks willing to perform regular drills that ensure IT services are recovery-ready when unplanned outages occur. Indian Overseas Bank (IOB) has a data centre with a near-site DR in Chennai. Its far-site DR centre is located in Hyderabad. Critical application data like the core banking application is written to the production data centre and the near-site almost simultaneously to ensure zero data loss in case of on an outage at the production data centre. IOB realised early on the challenges inherent in deploying a DR solution that is dependent on people. It chose to manage, monitor and automate its DR process using a DR management software. The end result is a high degree of confidence that recovery of critical applications will happen when required.
The recommended approach to get the most out of a DR investment is to look at the entire process as a lifecycle rather than a point solution
HDFC bank has reported reduction in IT recovery times by over 85 per cent resulting from better planning, implementation, coordination and use of the right technology solution. Andhra Bank’s critical applications have to meet an RPO of about 4-5 minutes and a recovery time of under 4 hours. Andhra Bank also regularly tests its recovery capability – usually unplanned.
Stock markets are not lagging behind. Market regulator Securities and Exchange Board of India (SEBI) requires all depositories who participate in the market to demonstrate their risk management system, including DR capabilities of IT applications. The National Securities Depository Limited (NSDL) introduced DR about thirteen years ago. NSDL operates from its DR site at least twice a year for a minimum of a week. This enables it to conduct all tests and ensure recovery readiness. This policy has been followed strictly year after year. NSDL creates scenarios for live testing not during odd hours but by shifting to the DR site intra-day when the stock market is in full operation.
IT, and hence IT DR, are equally vital for public sector giants such as Indian Oil and ONGC. Indian Oil was the first to go in for business continuity certification. The company clocks transactions worth Rs 30 million approximately each minute. Indian Oil has a four-stage DR and BCP to ensure continuous supply of petrol, diesel, LPG and other petrochemicals. ONGC has divided its data into three main streams: business data, real time data and scientific data. Scientific data gathered through surveys all over the country is collected to prospect and evaluate oil and gas. ONGC has deployed DR for its ERP applications that are critical for the functioning of the business. Air India introduced its DR system way back in 1996. The company faced two disasters shortly thereafter – one a minor fire and the other due to heavy rain. Both times DR was invoked resulting in services being available without interruption.
DR adoption by various organisations is robust and encouraging. By adopting the latest trends in DRM, Indian organisations are leap-frogging other regions in their ability to reduce IT downtime and successfully leverage their investment in DR. Cloud technology enables newer DR models and makes DR less expensive. There is growing awareness and more organsations, including smaller ones, are likely to go in for DR and DRM in the near future to create a fail-safe environment for their respective businesses.
A Typical Lifecycle DR Process
- Deploy best practice DR solution recommended by application vendors.
- Perform real-time monitoring of DR metrics and ensure objectives are met and DR systems are healthy and ready-to-go.
- Perform daily/weekly configuration checking to ensure DR systems are up-to-date with production systems with regard to ongoing change management updates.
- Perform monthly reporting about DR status to all stakeholders in the organisation so that the DR team is continuously updated about the DR programme.
- Perform quarterly or half yearly DR drills on the application at the DR site and validate DR readiness capability.
- Furnish yearly audit and compliance reports on DR drills and other DR activities to meet regulatory requirements.