Backing Up and Restoring Platform Data

This section describes how to back up and restore the data in your cluster.

Backup requirements

For production deployments, there are three distinct types of data that you will want to back up:

Type of Data Description
Cluster data Includes data about the operational state of your cluster.
Policy data Includes the policy you have defined for your cluster.
Job data Includes the app and package data that you deployed to your cluster.
Log data Includes Splunk and Redis data.

If Apcera is managing your cluster, Apcera can backup, archive, and otherwise process cluster and policy data within locations that meet your constraints. If you are managing your cluster, you will need to provide for this type of backup functionality.

Apcera does not back up job data because it is your proprietary data. It is your responsibility to back this data up.

Backing up and restoring cluster data

This section describes how to back up cluster data.

Installation files

You should back up and version control all platform installation files, including the cluster.conf file and Terraform files, using an external revision control system such as GitHub. Be sure to sync your local repository with master before making edits.

Database backups

A production deployment of Apcera includes several runtime PostgresSQL database instances that are installed when you set up the cluster based on the parameters set in the cluster.conf file. These components include the auditlog-database, component-database, and monitoring (which includes the Zabbix DB).

By default a cluster is configured to perform backups of these database components automatically, both daily and during the cluster deploy process. Those backups can be found on the database server in /var/lib/postgresql/backups. If necessary see also the Backup and Restore documentation for your PostgreSQL version.

The Apcera Auth Server is backed up as part of the component-database backup.

In addition, if the database_backups section of the cluster.conf is populated, we store database backups in S3:

chef: {
  "continuum": {
    "database_backups": {
      "access_key": "S3-ACCESS-KEY",
      "secret_key": "S3-SECRET-KEY",
      "endpoint": "s3-us-west-2.amazonaws.com", # S3 region
      "bucket": "S3-BUCKET-NAME"
    },
  }
}

If database backups are stored in S3, the backups are only kept on local disk on the master database server for each server group, otherwise backups are executed and stored on every server.

Orchestrator backup

The Orchestrator DB is a PostgreSQL instance running on the Orchestrator host. The Orchestrator DB contains cluster and Chef data. It is recommended that you back up the Orchestrator DB in case of error, or if you need to shutdown the cluster VMs and want to ensure that you preserve cluster state.

Automated Orchestrator backup procedure

The Orchestrator 0.5.3 release adds automated backup and restore to support Orchestrator VM migration. The use case for this feature is for removing Orchestrator from one VM and restoring it on another. You should not use two Orchestrators to deploy and manage a cluster. Once you have restored a new Orchestrator VM, run a deploy from the restored Orchestrator (so that the Chef agents are updated), and then you should retire the old Orchestrator.

You can use the orchestrator-cli backup --config FILE to backup the Orchestrator host.

The backup process outputs a backup file in the format orchestrator_backup_${TIMESTAMP}.tgz. The Orchestrator backup file contains the current release, the Orchestrator database, and the cluster configuration files and metadata.

You can restore the Orchestrator VM using the command orchestrator-cli restore --file FILE, where FILE is the path and file name to the Orchestrator backup file. Typically you will restore the file to an new Orchestrator VM that you have imported. Note that this machine must be able to ping all cluster hosts.

The orchestrator-cli restore command will only work with Apcera Platform release 2.4.0 and later. For releases before 2.4.0, you can restore the Orchestrator DB on the new machine, but if you choose not to upgrade before restoring, you will have to contact Apcera Support to help you manually reconfigure the Chef agents on cluster machines.

Manual Orchestrator DB backup procedure

The following procedure describes how to back up and restore the Orchestrator DB hosted on vSphere. If you are using an AWS or OpenStack, the process will be generally the same but may have slight differences.

This procedure is deprecated in favor of the automated backup and restore procedure described above. It is provided here for reference.

1) Log into the Orchestrator VM using SSH.

See connecting to Orchestrator using SSH.

2) Backup the Orchestrator DB and cluster.conf file.

$ pg_dump -U orchestrator orchestrator -f /tmp/orchestrator.sql
$ cp cluster.conf /tmp/cluster.conf

Note: With this method, a full backup using pg_dumpall is not required.

3) From your APC client, copy using scp these two files back to your local system.

$ scp orchestrator@10.1.1.101:/tmp/cluster.conf /tmp/
$ scp orchestrator@10.1.1.101:/tmp/orchestrator.sql /tmp/

Note: Replace the IP address 10.1.1.101 with the IP address of the Orchestrator VM you want to back up.

Orchestrator DB restore procedure

The following steps assume that the Orchestrator VM is removed or that the Orchestrator DB is corrupt and you need to create a new Orchestrator host.

1) From vSphere File menu - Deploy OVF template and choose the orchestrator-(version).ova as the source.

Alternatively, you can use ovftool. Here is a generic example but the individual options will need to be customized for your vSphere environment:

~/ovftool --noSSLVerify -vf=folder_name --name=name_of_orchestrator -dm=thin -ds="datastore_name" --net=network_name ~/path_to_orchestrator-(version).ova vi://user:password@vSpherehost/datacenter_name/host/IPAddress_of_esxi_host

2) Start the new Orchestrator VM in vSphere, right click on the VM and choose Power On.

Note: Wait until the new VM is started, then refresh the summary tab to display the IP address.

3) Copy the backup of cluster.conf and orchestrator.sql to the new Orchestrator VM.

$ scp /tmp/cluster.conf orchestrator@10.1.1.102:/tmp/
$ scp /tmp/orchestrator.sql orchestrator@10.1.1.102:/tmp/

Note: Replace the IP address: 10.1.1.102 with the address of the new Orchestrator VM.

4) Log into the Orchestrator VM and copy the files from /tmp.

$ ssh orchestrator@10.1.1.102 (enter password)
$ cp /tmp/orchestrator.sql .
$ cp /tmp/cluster.conf .

Note: When you log in as the orchestrator user, your default directory should be /home/orchestrator.

5) Initialize the Orchestrator cluster configuration.

$ orchestrator-cli deploy --dry -c cluster.conf

Note: When the --dry deploy completes you will see a message like:

Generating dependency tree... done
Dependency graph for dry run written to graph.png.

6) Import the orchestrator.sql database and check.

$ psql -f orchestrator.sql
$ psql -l

Note: The output of the psql -l command should look like the following:

orchestrator | orchestrator | UTF8 | en_US.UTF8 | en_US.UTF8 |
orchestrator_archive | orchestrator | UTF8 | en_US.UTF8 | en_US.UTF8 |
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +...
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +...
(5 rows)

At this point the database should be restored. To verify:

7) Re-run the deploy with the --dry option again.

$ orchestrator-cli deploy --dry -c cluster.conf

Note: Check that the new graph.png file has been created.

8) Run a quick sanity check with orchestrator-cli ssh command to log into one of the cluster nodes.

$ orchestrator-cli ssh

Starting up database... done
Multiple nodes matched your query, please select which:
1) IP: 10.1.1.154, Name: 3226f33d, Tags: monitored,router,job-manager,component-database,stagehand,api-server
2)...

Pick a host [1]:
Last login:...

Troubleshooting Orchestrator DB backup and restore

If you shut down the Orchestrator host VM, on reboot the Orchestrator VM cleans up assets written to /tmp directory.

To refresh the assets in the /tmp directory, run the command orchestrator-cli deploy -c cluster.conf --dry --release <RELEASE> before running the deploy command. In addition, because the IP address of the Orchestrator VM must be the same as before the host was shut down, on reboot reset the Orchestrator IP to the same as before shutdown.

Cluster logs

In addition, consider backing up the following cluster components using the techniques appropriate to each technology involved.

Component Remarks
redis-server Redis contains a buffer of recent jobs logs for all logs. While Redis provides backup capabilities, since this data is purged as needed to buffer new data, Apcera does not back up Redis data. If you need to retain job log data, you should configure a job log drain that meets your backup requirements and send the job logs to a syslog server.
splunk-server Splunk is an optional, third-party component typically used for component logging and troubleshooting. For the splunk-server, component log data is likely non-essential to backup, but if you have specific log retention compliance requirements you should back up this data. See the Splunk backup documentation for details.
zabbix-server See database backups.

Other cluster components

We consider the following components ephemeral and therefore do not back them up:

Component Remarks
Graphite Server Optionally you may choose to back up this component.
Nats Server Does not provide persistence.
Health Manager Does not store state.

Backing up and restoring policy data

This section describes how to back up and restore policy documents. It is recommended that you version control policy documents using an RCS such as GitHub.

Exporting policy

Use the 'policy export' command to export policy documents from your cluster.

apc policy export [<document-name>] [options]

NOTE: If no document name is provided all policy documents are exported.

The following command options are supported:

Command Description
-d, --dir [DIR] Put exported documents in a given directory. Default is a current workin directory.
-f, --force Overwrite existing files without asking.

For example, to export all policy documents to the current directory:

apc policy export

Or, to export a specific policy document to a specific directory:

apc policy export authSettings --dir /path/to/dir

Importing policy

Use the command apc policy import to import policy documents into your cluster.

apc policy import <filename.pol> [<filename.pol>...]

For example, to import a policy document:

apc policy import /path/to/policydoc.pol

Policy import supports wildcards:

apc policy import *.pol

Backing up and restoring job data

This section describes how to back up job data.

Service provider data

Assuming you are running stateless microservices bound to database backends for storage, you will likely need to back up one or more databases. We provide documentation for backing up and restoring MySQL DB. If you are using PosgreSQL, refer to the Backup and Restore documentation for your DB version.

If you are using Apcera File Share (APCFS) services for NFS persistence with HA NFS configured it is recommended that you enable snapshot backups.

Exporting and importing all cluster data

To export all cluster data, use the apc export all command. This command will export everything within a cluster into a single cntmp file, which can then be imported into another cluster using the apc package import command.

The 'export all' command cannot export multi-layered packages. If you receive an error, export package data and job data separately as described below.

Exporting packages

The 'package export' command exports one or many packages from a cluster into a cntmp file, which can then be imported into other clusters.

apc package export <package-name> [...]

For example, to export an individual package with the FQN package::/apcera/pkg/runtimes::go-1.5.1:

apc ns /apcera/pkg/runtimes
apc package export go-1.5.1.cntmp

To export all packages:

apc ns /
apc package export *.cntmp

Importing packages

The 'package import' command imports one or many space-separated .cntmp files into your cluster. By default, the namespaces of the contents are preserved.

apc package import </path/to/import-file-name> [...] [optional args]

Command options:

  -s, --skip-existing   - Warn, instead of error, if a package already exists.

  -o, --override        - Override cntmp contents namespaces with your local namespace.

Examples:

apc package import foo.cntmp bar.cntmp apcera.cntmp

To import all packages from the working directory:

apc package import *.cntmp --batch

If you receive an error that a package already exists, use the --skip-existing or -s flag.

apc package import *.cntmp --batch --skip-existing

By default a package is imported to the same namespace from where it was exported. To override the default import namespace, target the desired namespace and use the --override or -o flag.

For example, say you have exported the node-0.6.21 package from /apcera/pkg/runtimes and want to import it to the /acme/pkg/runtimes namespace:

apc namespace /acme/pkg/runtimes
apc package import node-0.6.21.cntmp -o

Exporting jobs

The 'job export' command exports one or many jobs from a cluster to a cntmp file, which can then be imported into other clusters.

To export one or more jobs:

apc job export <job-name> [...]

You must specify the job name or FQN. You may specify one or more job names in the command. You cannot export all jobs using a wildcard.

For example, export a job with the FQN job::/apcera::continuum-guide:

apc ns /apcera
apc job export continuum-guide

The file named "continuum-guide.cntmp" is downloaded to your local working directory.

To export multiple jobs in a namepsace:

apc ns /apcera
apc job export job1 job2 jobN

Importing jobs

The 'job import' command imports one or many space-separated .cntmp files into your cluster. By default, the namespaces of the contents are preserved.

apc import </path/to/import-file-name> [...] [optional args]

Command options:

  -s, --skip-existing   - Warn, instead of error, if a package already exists.

  -o, --override        - Override cntmp contents namespaces with your local namespace.

For example:

apc import foo.cntmp bar.cntmp apcera.cntmp