← Back to Blog High Availability Clustering with Pacemaker and Corosync
· 2 min read

High Availability Clustering with Pacemaker and Corosync

Pacemaker and Corosync provide Linux HA clustering that can automatically restart services and VMs after node failures.

What High Availability Clustering Does

A high availability cluster monitors services and nodes. When a service crashes or a node fails, the cluster automatically restarts the service or moves it to another node. The goal is minimizing downtime without manual intervention.

The Stack

  • Corosync: Handles cluster communication, membership, and quorum. Nodes use Corosync to know who is alive in the cluster.
  • Pacemaker: The cluster resource manager. It decides what to do when failures are detected. Start this service on that node, move this IP address to another node.

Installation (RHEL/Rocky Linux)

dnf install pacemaker corosync pcs
systemctl enable pcsd
passwd hacluster  # Set the hacluster user password

Creating a Cluster

# On all nodes, authenticate
pcs host auth node1 node2

# Create the cluster from node1
pcs cluster setup ha-cluster node1 node2
pcs cluster start --all
pcs cluster enable --all

Configuring Resources

# Create a floating IP resource
pcs resource create virtual-ip IPaddr2 ip=192.168.1.100   cidr_netmask=24 op monitor interval=30s

# Create a service resource
pcs resource create nginx systemd:nginx   op monitor interval=30s

# Create a resource group (starts in order, stops in reverse)
pcs resource group add web-group virtual-ip nginx

Fencing

Fencing (STONITH - Shoot The Other Node In The Head) ensures that a failed node is truly offline before resources are moved. Without fencing, two nodes might both believe they are authoritative, leading to data corruption. Configure IPMI-based fencing so the cluster can power-cycle a node it cannot reach.

pcs stonith create ipmi-node1 fence_ipmilan   ipaddr=192.168.10.101 username=admin password=secret   pcmk_host_list=node1