Disaster recovery using LifeCycle Manager

By "disaster" is typically meant fatal hardware failures such as failed disks, network adapters, or entire machines. Such failures can lead to the loss of major components of a grid.

Recovering from a missing registry host

If the host containing the registry is missing, you have a serious problem. It must be resolved as soon as possible. Without the registry, you cannot start new application nodes, for example. However, existing application nodes will be able to continue executing client requests as long as the clients are able to connect to the grid using a router on another host belonging to the grid.

Note: One way of avoiding this situation is to cluster the registry host.

The administrative router is also located in the same host as the registry, so that router is also missing. The administrative router is not critical in itself, but since you cannot get to it you cannot use it to make any configuration changes that would resolve the problems.

How to recover from this situation depends on the topology of the grid. If the grid consisted of only one host, the entire grid is lost and the way forward is to install a new grid and reinstall the applications that were deployed in that grid. Consult the documentation of each application for backup and restore procedures. The only alternative to reinstalling the grid is to have some form of machine backup, perhaps in the form of a captured image, which can be restored. The rest of the procedures described below are not relevant in this case since you simply recreate the entire grid.

If, however, the grid has several hosts, you need to move the registry to one of the remaining hosts. You can only move the registry to a non-transient host, so the procedure to move the registry includes an initial task to check if all of the remaining hosts are transient. If all are transient, you will then need to reconfigure one of them as non-transient. If at least one remaining host is non-transient, the recommendation is to move the registry there.

The procedure to recover from a missing registry host includes the following main tasks:

  1. Stopping the Grid

  2. Moving the registry

  3. Removing the missing host

  4. Dealing with deployed applications or parts of applications that only existed on the removed host

  5. Replacing missing routers and connection dispatchers

  6. Removing a missing host from the configuration