Disaster Recovery & High Availability

DR-transcartoon

Does your company have a disaster recovery or high availability plan?  Is it tested on a regular basis to make sure it is fully functional?  With over 30 years of experience with infrastructure, servers and storage the G/S team can develop a DR/HA plan, implement changes that may be needed to your already exists DR/HR strategy or do periodic health checks on your system.

There are 3 information layers to consider when developing a DR/HA plan:

Infrastructure Layer

The infrastructure layer needs to be architected in the most resilient fashion possible, since all recovery and resiliency capabilities will be based upon the weakest link. This layer actually represents that part of the data center which must be built to 24 x 7 x forever uptime. Redundant components, secondary routes, and redundant power are all characteristics of a redundant infrastructure. When considering a secondary site, best practice is to develop a site that is an extension of your primary site, not a mirror or isolated copy. This is generally one of the more difficult architectures to achieve.

Storage Layer

The storage layer contains the vital data of your business. Let’s face it—if a major disaster was to strike, you would eventually be able to replace your servers, applications, and network, but if your data is lost, it’s lost forever. In a resilient environment, there are copies of the data for local processing, copies for local restoration and local outages, and copies for disaster recovery. When considering how to architect the data storage for your business, plan for at least four aspects of data management:

  • Disaster recovery – an offsite copy of the data that can be restored.
  • Operational recovery – replacement of the local copy of the data in the event it gets erased or corrupted.
  • Archive – the long-term storage of data based upon your company’s retention and regulatory policies and production.

Replication – the need to have multiple copies or secondary copies of the data where processes such as data mining, reporting, or data warehousing can be performed without affecting the transaction performance of the application.The server layer includes the file servers and the strategy for keeping them active and running, regardless of platform type (Mainframe, UNIX, Linux, Windows, etc.). Three common architectural approaches to resiliency and recovery are:

Server Layer

Virtualizing the servers so the application sits on a virtual container or virtual machine on the server. The virtual environment can be built so there is enough capacity for an N+x relationship between the virtual environment and the virtual machine’s capacity needs.

Clustering the servers so that two or more servers are working in unison supporting an application or process. Unlike virtualization, clustering typically focuses on a specific application or set of servers and is not necessarily intended to establish a ubiquitous processing container for the applications.

Load balancing the servers to create two separate processing domains that can work in conjunction with each other and yet are maintained separately. To accomplish this, an appliance typically sits in front of the servers to direct where the transaction request will be processed.

 

Three important criteria need to be considered in order to define a good strategy:

  1. An understanding of the resources that you have to support the business.
  2. An understanding of the impact of an outage or a disruption of service.
  3. An understanding of the risk tolerance of your company’s culture.

A business impact analysis, whether it’s a formal study of the business process or an informal estimation, should define the impact a disruption of service has on the business and then quantify that outage in business terms. As an example: If you can’t process sales orders for a day, this may have a $1 million negative impact on your business. However, if you can’t process orders for four days, the negative impact could be as much as $7 million. And if the outage continues for more than two weeks, it could bankrupt you altogether. Thus the idea that the longer the outage, the more severe the impact to the business.

This information establishes the baseline for making a business case regarding the type and level of investment that the organization should make to prevent or plan for the recovery after an outage. The more likely the occurrence and the more costly the disruption of service, the more justifiable an investment in recovery or resiliency is.

MIMIX Availability virtually eliminates planned and unplanned downtime with innovative features that minimize the administration of high availability and ensure data integrity.

iTera Availability for IBM i virtually eliminates planned and unplanned downtime by maintaining a real-time backup system that can quickly take over as the production system when required.

PowerHA  with high availability, business continuity, and disaster recovery, Power Systems is committed to investing in and bring to market solutions designed to keep your IT environments resilient.  The objective behind implementing a high availability solution is to provide near-continuous application availability through both planned and unplanned outages.

VMware vCenter Site Recovery Manager is a disaster recovery solution that provides automated orchestration and non-disruptive testing of centralized recovery plans for all virtualized applications.

Tivoli System Automation is a very flexible product that has a great depth of functionality.  It supports many sophisticated means of disaster recovery preparedness including electronic of off-site DR data.  Its core functionalities, however, are to support disaster recovery by producing off site tape copies of backed up and archived data and to define the processes for recovering the TSM server in the event of its total loss.

 

The G/S Disaster Recovery and High Availablity Methodology

  1. Discovery: Develop a detailed understanding of your infrastructure, applications and business processes.
  2. Analysis: Evaluate the information and requirements defined in the Discovery phase and begin to formulate recommendations.
  3. Recommendations: Use data and requirements compiled or generated in the Analysis phase to develop recommendations for a robust, optimized and supported consolidation environment.

 

The challenge in developing a disaster recovery strategy is to determine the best fit of hardware and/or software resiliency / recovery for your environment, recognizing that although a particular solution may work well for a specific purpose, it may also complicate the overall recovery / resiliency architecture. The combination of both hardware and software architectures should be evaluated based on your overall resiliency and recovery strategy and not on how well it satisfies the requirements of a single application component.