Failover Clusters

Failover clusters consist of two or more servers, usually of identical configuration, that access common storage media. Typically, storage is in the form of a SAN.

Failover cluster is well suited to back-end database applications, file-sharing servers, messaging servers and other applications that are both mission critical and deal with dynamic read/write data.

  • Clustered application —An application or service that is installed on two or more servers that participate in a failover cluster. Also called clustered service.
  • Cluster server—A Windows Server 2008 server that participates in a failover cluster. A cluster server is also referred to as a cluster node or cluster member.
  • Active node—A cluster member that is responding to client requests for a network applica-tion or service. Also referred to as active server.
  • Passive node —A cluster member that is not currently responding to client requests for a clustered application but is in standby mode to do so if the active node fails. Also referred to as passive server.
  • Standby mode —A cluster node that is not active is said to be in standby mode.
  • Quorum —A quorum is a database containing the cluster confguration data, which specifes the status of each node (active or passive) for each of the clustered applications. The quorum is also used to determine, in the event of server or communication failure, if the cluster is to remain online and which servers should continue to participate in the cluster.
  • Cluster heartbeat—Communication between cluster nodes that provides status of each of the cluster members. The cluster heartbeat, or lack of it, informs the cluster when a server is no longer communicating. The cluster heartbeat information communicates the state of each node to the cluster quorum.
  • Witness disk—The witness disk is shared storage used to store the cluster configuration data and is used for helping to determine the cluster quorum.

A failover cluster consists of two or more servers, usually of identical configuration, that access common storage media. Typically, storage is in the form of a storage area network (SAN). One server is considered the active server, and other servers are considered passive. Active server handles all client requests for the clustered application, passive server wait in a type of standby mode.

CSV (cluster shared volumes)
Benefits

This enables a virtual machine (VM) complete mobility throughout the cluster as any node can access the VHD files on the shared volume. Cluster Shared Volumes simplifies storage management by allowing large numbers of VMs to be accessed off a common shared disk. CSV also increases the resiliency of the cluster by having I/O fault detection and recovery over alternate communication paths between the nodes in the cluster, meaning that if one part of a network goes down, communication can be accomplished through another part of the network.

In a cluster without CSV, only one node can access a disk LUN at a time, so multiple disks are required for migration. With Cluster Shared Volumes, storage is simplified because multiple nodes can access the same disk at once and fewer overall disks are needed. CSV can also reduce potential disconnection time when performing a live migration of VMs.

CSV requires NTFS, but there are no hardware requirements beyond what is needed for a failover cluster.

While CSV is not required for Live Migration of VMs, it reduces the potential disconnection period at the end of the migration since the NTFS file system does not have to be unmounted/mounted as is the case with a traditional cluster disk. This helps ensure seamless live migration since the physical disk resource does not need to be moved between nodes. CSV increases the chance that a live migration will complete within the TCP reconnect window and ensure a seamless operation to clients.

Requirements

To use CSV, a Hyper-V VM is configured and the associated virtual hard disk(s) are created on or copied to a CSV disk. Multiple VHDs can be placed on a CSV that in turn are associated with multiple VMs which can be running on different nodes in the cluster.

Technical Details

Cluster Shared Volumes operates by orchestrating metadata I/O operations between the nodes in the cluster via the Server Message Block protocol. The node with ownership of the LUN orchestrating metadata updates to the NTFS volume is referred to as the Coordinator Node. Read / Write operations are passed directly to the Serial attached SCSI, iSCSI, Fibre Channel, or Fibre Channel over Ethernet shared storage via block based protocols.

CSV builds a common global namespace across the cluster using NTFS reparse point. Volumes are accessible under the %SystemDrive%\ClusterStorage root directory from any node in the cluster.

The Cluster will automatically prioritize the most favorable network to route I/O operations by selecting the cluster shared network with the lowest cluster network metric value, this can also be manually configured. Public networks (i.e. networks that connect to users) are assigned higher cluster network metric values by default; this favors I/O operations from using the public network which may already be saturated with user requests.

CSV can be enabled in the Failover Cluster Manager MMC snap-in by selecting ‘Enable Shared Volumes’ from the information pane after creating a cluster. Additionally, CSV can be enabled using PowerShell:

Import-Module FailoverClusters
Get-Cluster [-Name <cluster Name>]| Where-Object { $_.EnableSharedVolumes -eq "Enabled" }

Before configure the Failover cluster: Make sure the “Cluster Service” is disabled, or else, you will get an  error message: “The computer xxxx is joined to a cluster”.

Best practice

1. High availability software solutions are augmented by hardware fault tolerance; make sure your server hardware has fault-tolerant features such as hot-swappable RAID disk drives and redundant power supplies.
2. Use round-robin load balancing for an easy and quick load balancing solution but be aware of its limitations: no recognition of a down server, cached client records, and lack of prioritization.
3. Use network load balancing for applications in which the data being accessed is easily replicated among servers and is not changed by users.
4. Before using any clustering solution, make sure your OS version and updates are consistent among all servers.

5. Before creating your NLB cluster, make sure that DNS is set up correctly; a zone for the FQDN of the cluster must exist and A records for each server and the cluster name must exist.
6.  It is recommended that you use multiple NICs on your server whereby one NIC is dedicated to non-cluster related communication.
7. Create port rules to ensure the cluster only accepts communication for services that are specifcally offered by all cluster members.
8. Use the Multiple host fltering mode option on your NLB cluster to provide scalability; use the Single host fltering mode to provide fault tolerance without scalability.
9. Use failover clusters to provide the highest level of fault tolerance.
10. Be sure to choose the quorum model that best supports your failover cluster confguration.
11. Server components used in a failover cluster should meet the Certifed Windows Server 2008 requirements.
12. For best disk performance in your failover cluster, use SAS, Fibre Channel, or iSCSI storage technologies.
13. Run the cluster validation wizard before you create a new cluster and again periodically after your cluster is running to revalidate the confguration