Businesses and organizations should be able to achieve near 100% uptime with reasonable costs. In this article we discuss some technical procedures and alternatives for creating a high availability infrastructure, incorporating hardware, software and networking components.
Here is a short list of technical benefits of a typical HA infrastructure:
- High availability and failover, the first and foremost benefit, where the system is always online despite network and hardware downtime that may occur.
- Improved scalability whereby growth and peaks in usage can be handled more gracefully and with more consistent response times to the end user.
- Improved performance and response times due to load balancing techniques that distribute the work where it can be done in the shortest amount of time.
- Simplified system maintenance since one is able to take down a node for maintenance and have all work automatically routed to the other available nodes or systems. When the node is brought back online, it will automatically begin to receive work.
Some business benefits may include:
- An edge on the competition since HA infrastructures have consistently smooth performance and uptime.
- Increased revenues and profits due mainly to the nurturing of satisfied loyal customers.
- Improved customer satisfaction due mainly to the stability of the services offered.
- Improved customer retention due to faster performance and fewer customers leaving the site before it has a chance to load.
Components of HA Architectures
Most High-Availability Infrastructures exhibit the following characteristics:
- Redundancy - duplication of hardware, networks, and physical locations.
- Load distribution mechanisms - both across global and within each of the local systems.
- Failover mechanisms - also across global and within local systems.
- Data synchronization techniques such as database replication and static content synchronization.
Redundancy
Redundancy in all infrastructure components enables near perfect uptime when various components of the system fail or breakdown. This could involve hardware failures, server lock-ups, as well as network downtime from your providers. A thorough infrastructure plan needs to account for failure in all components which would ideally be handled automatically or with a very manual fast response.
Load Balancing
Load balancing allows for scalability of your infrastructure, spreading load and requests across multiple servers, data centers and networks.
- Load balancing across a cluster of servers in a single location is best achieved with hardware load balancers manufactured by Cisco, Nortel Networks, and Foundry Networks among others. Linux Virtual Server is an alternative that really does not compare well to enterprise load balancing products from the major networking companies.
- Global load balancing across multiple datacenters is accomplished using DNS. Each location may in turn have a set of load balancers to distribute load across multiple local servers. Sophisticated DNS load balancing will automatically select the datacenter that will provide the best performance for each web customer.
Failover
Failover means that in the event of a hardware or network outage customer requests are automatically redirected to an alternate location or set of servers. This means that when planning your infrastructure you need to account that failure of a node such as a network, server or entire datacenter will still allow for the remaining working components to handle 100% of the existing load. The cost of redundancy decreases with the number of independent nodes. For example, if there are 2 nodes, each needs to handle 100% of the work load in the event of a failure requiring capacity for 200% load. For a 3 node system, a single node failure requires the remaining two nodes to handle 100%. Therefore, each node needs to have capacity to handle 50% of the entire workload, for a total of 150% across the 3 nodes.
Data Synchronization Techniques
SAN (Storage Area Network)
Perhaps the most efficient approach for data replication and synchronization is the use of hardware based block-level replication mechanisms available in many SAN product lines. Replication can be configured over TCP/IP to remote sites as well as using local interconnects for systems contained within a single location.
Furthermore, block level replication can be asynchronous, for remote replication and in order to support disaster recovery scenarios, as well as synchronous and transactional, meaning that data must be written to both systems before control is returned to the calling program.
Many SAN products contain a scheduled instant snapshot feature that allows efficient storage of a version or state of the data for backup purposes. Under these circumstances as block level corruption that is replicated can be reversed by reverting to a snapshot copy of the data before the corruption took place.
Database Replication
Native database replication is a very common technique for creating multiple copies of the same data for failover, redundancy, scalability and performance. All the major database products support various types of replication. For example, Oracle, Sybase, SQL Server, MySQL, PostgreSQL, and DB2 all support replication.
The mechanism works by writing binary logs of all transactions, transferring the atomic transaction information from master to slave servers and having the slave servers apply the same transactions as the master. Note that this technique causes all the database servers to undergo the same identical create/update load. The load generated by SQL select statements can be different on each server and can be load balanced to improve performance under certain circumstances.
Master/Slave Relationships
One alternative is to have a single master in a single location that is accessed from all sites. This presents numerous issues, such as:
- Network latency from remote site to master location
- Network downtime between sites may cause one site to failover to the slave server when this is not appropriate.
- Logic for a site to determine when to make the slave the master is complex and error prone.
This type of replication is suitable for the purposes of creating redundancy within a single network much more so that globally distributing data.
Segment Primary Keys
A good alternative is to essentially have each distributed database be independent, each with independent sets of primary key ranges. For example, your database on the east coast can have customer ids between 1 and 10,000,000 while the west coast database can create them between 10,000,001 and up.
In this way, the two databases can replicate data in a master-master relationship without conflict in the creation of new records.
Secure/Encrypted
Most database replication schemes can be configured to use secure, SSL encrypted channels for transferring data between global locations. This is desirable in most circumstances assuming that the data is sensitive and might contain information such as credit card numbers and customer's personal data.
Messaging (Message Queuing)
Message queuing is an age old distributed computing technique where disparate systems exchange messages via a third party tool such as IBM WebSphere MQ, or Java's JMS (Java Message Service). Message Queuing provides what is called fire-and-forget functionality for asynchronous processing.
The creation of the message can be treated as part of a global transaction involving database writes. Once the message is successfully created, the process or thread does not need to wait until it is processed and can continue about its work without any delay to the end user.
Messages can be persisted (stored) to either the local file system or a database and are thus able to survive a crash of the processes on the physical machine. As long as these storage mechanisms are not corrupted the messages are guaranteed to eventually get sent and processed.
The messages can contain anything one desires. For example, fixed length headers defining the type of message, SQL commands, XML, or any binary or ASCII data are all typical.
Products such as IBM's WebSphere MQ run on dozens of different platforms, from wireless devices, to mainframes, to PC's and many proprietary hardware systems. Similarly, JMS can run on any system that supports Java.
Static Content Publishing
Synchronization of static content, meaning content other than software or scripts, such as HTML, image and movie files can be handled using a variety of simple techniques. For example:
- rsync - A LINUX/UNIX remote synchronization utility that can be configured to send changed files from a master server to the redundant server locations.
- scp - LINUX/UNIX secure copy command that can be triggered to send file updates to remote systems.
- FTP and SFTP - File Transfer Protocol and the secure version can be used to transfer files between disparate architectures such as Unix and Windows servers.
- While the above techniques can be automated and run on specified schedules, one may simply choose to manually push changes from a workstation to all the servers in the global cluster.
- Reverse Proxy Caching - Install a caching appliance such as cacheflow (bluecoat) or configure Apache as a reverse proxy cache server. The server will obtain content from your web servers only the first time it is requested and server from memory and disk cache as necessary.
Why HostVentures?
Responsiveness and Individual Attention throughout the implementation lifecycle, result in more effective and cost efficient solutions.
Experience and understanding of the full IT lifecycle results in relevant and timely infrastructure alternatives for clients.
IT Data Point
An IDC survey finds that "Infrastructure improvement, including data center consolidation and virtualization, application consolidation, and data consolidation, was most frequently mentioned as a priority aimed at achieving lower cost, higher performance IT."
read more...
Press Release
HostVentures As Seen On Extreme Makeover Home Edition Sundays, 8/7c on ABC
HostVentures is hosting the Clark Turner Signature Homes website for ABC's Extreme Makeover: Home Edition.
read more...
