4. Failover an application


In case of a disaster, where one of your Kubernetes clusters goes down and is inaccessible, you would want to failover the applications running on it to an operational Kubernetes cluster.

For this section, we will refer to,

  • Source Cluster as the Kubernetes cluster which is down and where your applications were originally running. (In this example: cluster_domain: us-east-1a)
  • Destination Cluster as the Kubernetes cluster where the applications will be failed over. (In this example: cluster_domain: us-east-1b)

In order to failover the application, you need to instruct Stork and Portworx that one of your Kubernetes clusters is down and inactive.

Deactivate failed Cluster Domain

In order to initiate a failover, we need to first mark the source cluster as inactive.

Using storkctl

Run the following storkctl command to deactivate the source cluster

storkctl:

storkctl deactivate clusterdomain us-east-1a

Run the above command on the destination cluster where Portworx is still running. To validate that the command has succeeded you can check the status of all the cluster domains using storkctl:

storkctl get clusterdomainsstatus

When a domain gets successfully deactivated the above command should return something like this:

NAME            ACTIVE         INACTIVE       CREATED
px-dr-cluster   [us-east-1b]   [us-east-1a]   09 Apr 19 17:12 PDT

You can see that the cluster domain us-east-1a is now Inactive

Using kubectl

If you wish to use kubectl instead of storkctl, you can create a ClusterDomainUpdate object as explained below. If you have already used storkctl you can skip this section.

Let’s create a new file named clusterdomainupdate.yaml that specifies an object called ClusterDomainUpdate and designates the cluster domain of the source cluster as inactive:

apiVersion: stork.libopenstorage.org/v1alpha1
kind: ClusterDomainUpdate
metadata:
  name: deactivate-us-east-1a
  namespace: kube-system
spec:
  # Name of the metro domain that needs to be activated/deactivated
  clusterdomain: us-east-1a
  # Set to true to activate cluster domain
  # Set to false to deactivate cluster domain
  active: false

In order to invoke from command-line, run the following:

kubectl create -f clusterdomainupdate.yaml

Stop the application on the source cluster (if accessible)

If your source Kubernetes cluster is still alive and is accessible, Portworx, Inc. recommends you to stop the applications before failing them over to the destination cluster.

You can stop the applications from running by changing the replica count of your deployments and statefulsets to 0. In this way, your application resources will persist in Kubernetes, but the actual application would not be running.

kubectl scale --replicas 0 deployment/mysql -n migrationnamespace

Since the replicas for the mysql deployment are set to 0, we need to suspend the migration schedule on the source cluster. This is done so that the mysql deployment on the target cluster doesn’t get updated to 0 replicas.

Apply the below spec. Notice the suspend: true.

apiVersion: stork.libopenstorage.org/v1alpha1
kind: MigrationSchedule
metadata:
  name: mysqlmigrationschedule
  namespace: migrationnamespace
spec:
  template:
    spec:
      # This should be the name of the cluster pair created above
      clusterPair: remotecluster
      # If set to false this will migrate only the Portworx volumes. No PVCs, apps, etc will be migrated
      includeResources: true
      # If set to false, the deployments and stateful set replicas will be set to 0 on the destination.
      # If set to true, the deployments and stateful sets will start running once the migration is done
      # There will be an annotation with "stork.openstorage.org/migrationReplicas" on the destinationto store the replica count from the source.
      startApplications: false
       # If set to false, the volumes will not be migrated
      includeVolumes: false
      # List of namespaces to migrate
      namespaces:
      - migrationnamespace
  schedulePolicyName: testpolicy
  suspend: true

Using storkctl, verify the schedule is suspended.

storkctl get migrationschedule -n migrationnamespace
NAME                        POLICYNAME   CLUSTERPAIR      SUSPEND   LAST-SUCCESS-TIME     LAST-SUCCESS-DURATION
mysqlmigrationschedule      testpolicy    remotecluster     true      17 Apr 19 15:18 PDT   2m0s

Start the application on the destination cluster

In step 2., we migrated the applications to the destination cluster but the replica count was set to 0 for all the deployments and statefulsets, so that they do not run. You can now scale the applications by setting the replica counts to the desired value.

Each application spec will have the following annotation stork.openstorage.org/migrationReplicas indicating what was the replica count for it on the source cluster.

Once the replica count is updated, the application would start running, and the failover will be completed.

You can use the following command to scale the application:

kubectl scale --replicas 1 deployment/mysql -n migrationnamespace

You can also use:

storkctl activate migration -n migrationnamespace

which will look for that annotation and scale it to the correct number automatically.

Let’s make sure our application is up and running. List the pods with:

kubectl get pods -n migrationnamespace
NAME                     READY     STATUS    RESTARTS   AGE
mysql-5857989b5d-48mwf   1/1       Running   0          3m


Last edited: Sunday, Oct 16, 2022