We're updating some of our product names. K2 Five is now Nintex Automation. You may see both product names in our help pages while we make this change.

Disaster Recovery

When upgrading or performing maintenance on your K2 environment, first:

Ensure there are no dependency issues in your solutions. See Dependency Checking for information about dependencies and how to resolve dependency issues.
Backup your K2 database
Create a checkpoint (snapshot) if K2 is running in a virtual environment
When creating a new environment for purposes of testing, it is best practice to create a new environment and not clone an existing environment as cloning an environment can cause unexpected behavior.

This allows you to revert the environment in case of a failed upgrade.

Disaster recovery is the process, policies and procedures put in place to deal with potential disasters that result in complete system outage, such as a natural disaster that takes the production data center offline. A disaster recovery plan forms part of a business continuity plan (BCP) and is essential to any organization that wants to either maintain or quickly resume mission-critical functions after such a disaster. The disaster recovery plan should typically include an analysis of business processes and continuity needs, especially planning for resumption of applications, data, hardware, communications (such as networking) and other IT infrastructure. You must also give attention to disaster prevention. As K2 interacts with other external systems such as SharePoint or other line-of-business (LOB) systems, it is important to include all related systems in your disaster recovery planning for K2.

When developing a disaster recovery plan, there are a couple of industry standard considerations that will help focus on how extensive the procedures and underlying infrastructure needs to be to support these goals:

Recovery Time Objective (RTO)
The amount of time a system may be offline. Put another way, the maximum amount of time that it can take to bring K2 back to an operational state following a disaster recovery event. This will help assess the level of investment and rigor needed in creating and maintaining a parallel disaster recovery environment.
Recovery Point Objective (RPO)
The maximum amount of time for which data may lost following an outage. This will influence the database backup and retention strategy.

Below is an example of RTO and RPO:

Assume a K2 platform supports a business unit where all of their solutions must be operational within five minutes of a disaster event. When the system comes back online, it must ensure that it has data consistent within 15 minutes of the disaster recovery event.

In this scenario RTO is five minutes and RPO is 15 minutes.

Determining RTO and RPO is important because it will help focus the level of effort and expense associated with building disaster recovery processes and infrastructure to support required objectives. Generally, the lower the objectives the higher level of effort and / or expense in order to support more aggressive service levels that translates for more investment in hardware, software and automation.

Important Considerations

When evaluating how to factor K2 into a disaster recovery plan it is important to account for the following:

There is no true active/active K2 configuration
Any operational K2 server will attempt to execute workflows. This is important to understand in a disaster recovery scenario because it means that while the K2 server environments can be installed and configured in the disaster recovery environment, the K2 host services (the actual Windows service that runs K2) must not be running. Assume a scenario where there is some data synchronization between production and disaster recovery environments. If the disaster recovery K2 server is running, it will attempt to handle workflow events such as escalations (where it might look to send a reminder email to a person that has not completed his / her task). The user would then receive two email reminders, one each from the production and disaster recovery K2 environments. Thus all disaster recovery procedures should account for the starting of the K2 services in the disaster recovery environment when a disaster recovery event occurs.
DNS is an important part of disaster recovery planning and procedures
It is essential that all URLs and referenced server names still resolve correctly after a disaster recovery cut over. Failure to account for this may result in unexpected application and platform behavior since connections that assume specific URLs, URIs, server names, connection strings, etc., will no longer be valid. Using DNS aliases for all component interaction allows for building disaster recovery event procedures (that deal with the switching of the aliases within DNS) that will ultimately result in consistent name / URL resolution. Alternatively some disaster recovery strategies stipulate a topology change, which adds complexity and yields some benefits.
K2 platform configurations and customizations
Any disaster recovery environment should be running the same version of K2 (and ancillary software) as the production environment to ensure the K2 base platform functionality is identical. However, it is important to understand all production configurations as well as platform customizations (such as custom ServiceBrokers, user managers, K2 custom controls, etc.) to ensure that the disaster recovery environment is configured identically.

K2 product database

The most significant portion of a K2 disaster recovery plan centers on the K2 product database. K2 keeps all workflow state and form, workflow and system configuration within its own SQL Server database. As such, it is imperative that this data be managed appropriately to support the desired disaster recovery goals.

The SQL Server product supports a number of methods of managing data within disaster recovery scenarios. Below outlines how K2 interacts with these approaches:

It is of the utmost importance to back up the K2 database as this forms the core of the K2 functionality and data related information. The following presents a short description of the different options catered for by SQL Server when backing up the K2 database.

SQL Disaster Recovery Options
SQL Server Feature	Typical Use Case	Additional Details
Backup and Restore	Backup refers to the copying of data so that the additional copies may be restored after a data loss event. Backups differ from archives and backup systems differ from fault-tolerant systems. Backups are useful primarily for two purposes: To restore a computer to an operational state following a disaster. To restore small numbers of files after they have been accidentally deleted or corrupted. This option supports both full and incremental backups.	Backing up Keys and Certificates
Log Shipping	Log shipping allows you to automatically send transaction log backups from a primary database on a primary server instance, to one or more secondary databases on separate secondary server instances. The transaction log backups are applied to each of the secondary databases individually. An optional third server instance, known as the monitor server, records the history and status of backup and restore operations and, optionally, raises alerts if these operations fail to occur as scheduled. There is however, no guarantee that the various databases will be in sync after restoration due to the fact that K2 server is writing entries from the K2Server table to K2ServerLog table as the backup of the logs for each table occurs. This method is therefore not the preferred disaster recovery option for K2.	Recommended Disaster Recovery Procedure
Database Mirroring	Database mirroring is a primarily software solution for increasing database availability. Mirroring is implemented on a per-database basis and works only with databases that use the full recovery model. The simple and bulk-logged recovery models do not support database mirroring. Database mirroring is supported in SQL Server Standard and Enterprise. Database mirroring offers substantial availability and provides an easy-to-manage alternative or supplement to failover clustering or log shipping. When a database mirroring session is synchronized, database mirroring provides a hot standby server that supports rapid failover with no loss of data from committed transactions. During a typical mirroring session, after a production server fails, client applications can recover quickly by reconnecting to the standby server. As this method is similar to the log shipping mentioned above, it could result in databases being slightly out of sync, therefore, is not the preferred disaster recovery method for K2.	SQL Mirroring - manual failover
Database Clustering	A failover cluster is a combination of one or more physical disks in a Microsoft Cluster Service (MSCS) cluster group, known as a resource group, that are participating nodes of the cluster. The resource group is configured as a failover clustered instance that hosts an instance of SQL Server. A SQL Server failover clustered instance appears on the network as if it were a single computer, but has functionality that provides failover from one node to another if one node becomes unavailable. Failover clusters provide high-availability support for an entire Microsoft SQL Server instance, in contrast to database mirroring, which provides high-availability support for a single database. It is however, recommended to not change the cluster names with restoration as the K2 worklist tables will be out of sync and this will result in errors related to the client events.
AlwaysOn	Used to maintain a parallel disaster recovery environment. Reasonably supports RPO goals of less than two hours. Provided for single database scenarios from SQL Server 2012 onwards (be sure to point to the correct SQL Server AlwaysOn listener / instance).	Configuring MS SQL for AlwaysOn Automated Failover of the K2 Database.

Please refer to SQL Server documentation to understand the licensing and general usage considerations for each of the above features. K2 does not provide general SQL Server instructions on setting up and configuring these features within SQL Server.

For in-depth technical information on SQL Server, visit the following website http://msdn.microsoft.com/

Another option that is available within SQL Server, is replication. This option however, is not supported by K2.

Non-K2 products, services and line of business systems
Each of line of business system that the K2 platform interacts with must be identified and then reviewed to understand and plan for the proper disaster recovery approach. Failure to include this level of planning and execution may leave the overall K2 landscape in a reduced state of disaster recovery preparedness. Dependent data sources cannot be out of sync, or many false errors will be introduced in a disaster recovery test.

External LOB system interactivity should be captured for each K2 platform instance. This can be used to ensure all external systems are factored into the planning.
Test disaster recovery cut-over procedure
Once a disaster recovery cut over procedure is created that includes K2, as well as ancillary non K2 systems, be sure to test the procedure. Proper testing cycles helps ensure the procedures result in a successful and consistent parallel system that meets the targeted RTO and RPO goals. Additionally, this will permit the operations staff to understand the steps before a disaster occurs. It is imperative to plan and drill for disasters before they occur and not execute procedures for the first time during a disaster.
Continually reassess disaster recovery procedures
BCP and disaster recovery planning should be a recurring event as the needs of the platform frequently change over time due to change in business requirements / service levels as well as additional solutions brought into the stack. At a minimum, the disaster recovery plan should be reviewed any time a new solution is planned to be deployed to the environment.
Production and disaster recovery environments stay alignment
Make sure that any changes (upgrades, deployments, configuration changes, etc.) made to the production environment is fully propagated to the disaster recovery environment.

Disaster Recovery

K2 Components

Non K2-specific Components

Windows Server Machine