Recommended Disaster Recovery Procedure
This topic covers the K2 recommended procedures to enable Disaster Recovery (DR) within the infrastructure that leverage physically different data centers for production and disaster scenarios.
The following parameters are assumed:
- Accepted data loss of 4 hours
- K2 system data is consistent upon a cut over to the DR site
Should the window of accepted data loss need to be adjusted, the scheduled jobs laid out below should be recalculated accordingly.
The procedure below leverage an initial set of database backups and then transaction log shipping with a point-in-time recovery to allow for consistent K2 data in the event of a disaster event.
Prerequisite
The following should be done,
- Within the DR site:
-
- Record the K2 licensing information in the following SQL tables:
- K2Server._Server
- K2HostServer.LicenceKeys
 |
The data should be similar in both locations |
- Create a SQL Script that will reapply this information during an actual disaster recovery event.
 |
This is required because the K2 licensing is stored within the database and is bound to a specific environment. As such restoring databases across environments as in a DR scenario will replace the DR license with the PRODUCTION license. Creating a SQL script to restore the original DR license, while not always necessary, will make this a more repeatable procedure should it be desired. |
- Keep the K2 host server nodes turned off until a DR cut over is required. This ensures that the DR site does not try to process transactions based upon its copy of the K2 database.
Initial Setup
The following setup is required,
- Within the PRODUCTION site:
-
- Turn off the K2 service on all nodes within the K2 farm. This insures no K2 data manipulation.
- Create Full Database Backups of all K2 product databases as mentioned in K2 blackpearl databases Backup and Restore. Alternatively, incremental backups may also be used . Either way, compression of databases is advised for efficiency.
- Script out all external database server artifacts, e.g. database logins, users, etc.
- Backup additional database(s) leveraged by the solution.
- Turn on the K2 service on all nodes within the K2 farm.
- Move the database backups to the DR site.
- Within the DR site:
-
- Restore the PRODUCTION database backups.
- Reapply external artifacts (logins, etc).
- Re-apply K2 licensing, as identified in the Prerequisite step.
Ongoing
- Within the PRODUCTION site:
-
- Create a scheduled SQL Agent job (recommended hourly) within the SQL Server that backs up the transaction logs for the following:
- All K2 product databases
- Solution database(s)
- Schedule a Log Shipping job to transmit these logs to the DR site.
- Within the DR site:
-
- Within the DR site create a scheduled job (recommended hourly) that will apply any transaction log backups for the databases that are greater than 3 hours hold.
- Archive the just-applied Transaction Log files.
Disaster Recovery Cutover Procedure
- If possible, backup the ‘tail end’ of the log(s) from the primary site and transmit them to the secondary site.
- Determine the time of the disaster at the production site and subtract 2 hours. Choose an exact time of recovery that makes sense, for instance 13:35:00:000.
- Recover queued transaction logs for all databases to that same specific point in time (the chosen time may or may not include the ‘tail end’ backups obtained in 1). In this example, all databases should be recovered until 13:35:00:000.
- Reapply the appropriate K2 server licenses within the DR environment (preferably leveraging the script file recommended in the Prerequisite section, point number 2)
 |
As a result of the time difference, there will always be 1 (or 2 or 3 as need be) log difference. This allows a restore to a point in time operation on the final log in the event of a DR event. After this restore to a point in time operation all databases will be consistent. |
Testing
A manual failover process should be executed during development/QA to ensure that the procedure works as expected. All K2 operations should be tested in order to confirm that the primary data files and external artifacts (such as logins) are valid.