Backing up and restoring AEN

Document purpose

This document lays out the steps to backup and restore Anaconda Enterprise Notebooks (AEN) for Disaster Recovery. It is not intended to provide High Availability. Each of the components (Server, Gateway and Compute) has its own instructions and each may be done individually as needed. The steps primarily involve creating tar files of important configuration files and data.

This document is written for a system administrator who is comfortable with basic Linux command line navigation and usage.

To migrate to a new cluster, use these backup and restore instructions to back up the system from the old cluster and restore it to the new cluster.

Important notes

Review the Concepts page to become familiar with the different components and how they work together.

Root or sudo access is required for some commands.

CAUTION: All commands MUST be run by $AEN_SRVC_ACCT (the account used to run AEN) except for those commands explicitly indicated to run as root or sudo. If the commands are not run by the correct user, the installation will not work, and a full uninstallation and reinstallation will be required!

These instructions assume that the fully qualified domain name (FQDN) has not changed for any of the component nodes. If any of the FQDNs are not the same, additional steps will be needed.

Server component steps

Backup

Mongo database

This will create a single tar file called aen_mongo_backup.tar that includes only the database named “wakari” that is used by AEN. It also generates a log of the database backup.

NOTE: These commands must be run by $AEN_SRVC_ACCT.

mongodump -db wakari -o aen_main >> mongo_backup.log
tar -cvf aen_mongo_backup.tar aen_main

AEN Server config files (including License file)

Create a tar file of all of the configuration files, including any license files.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -cvf aen_server_config.tar -C /opt/wakari/ wakari-server/etc/wakari/

Nginx config (if needed)

Make a copy of the nginx configuration file if it has been customized. The default configuration for the AEN server is a symlink.

NOTE: This command must be run by $AEN_SRVC_ACCT.

/etc/nginx/conf.d/www.enterprise.conf -> /opt/wakari/wakari-server/etc/nginx/conf.d/www.enterprise.conf

SSL certificates (if needed)

Make a copy of the SSL certificates files (certfiles) for the server, including the key file, and a copy of the certfile for the gateway, which is needed for verification if using self-signed or private CA signed certs.

Restore

Reinstall AEN-Server

See the instructions for installing the current version of AEN-Server.

It is not necessary to upload the license, because it will be restored with the config files.

NOTE: The new installation will generate a new password for the local $AEN_SRVC_ACCT account.

Restore Mongo database

This assumes that mongo was reinstalled as part of the reinstallation of the server component. Untar the mongo database and restore it.

NOTE: These commands must be run by $AEN_SRVC_ACCT.

tar -xvf aen_mongo_backup.tar
mongorestore --drop aen_main

NOTE: The --drop option resets the $AEN_SRVC_ACCT user password and restores the database to the exact state it was in at the time of backup. Please see the MongoDB documentation for more information about mongorestore options for Mongo 2.6.

NOTE: AEN uses Mongo 2.6 by default. If you are using a different version, consult the documentation for your version.

AEN Server config files (including License file)

Untar the tar file of all of the configuration files, including any license files.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -xvf aen_server_config.tar -C /opt/wakari/

Make sure the files are in /opt/wakari/wakari-server/etc/wakari/ and are owned by the $AEN_SRVC_ACCT.

Nginx config (if needed)

Make sure any modifications to the nginx configuration are either in /etc/nginx/conf.d or in /opt/wakari/wakari-server/etc/nginx/conf.d/ with a proper symlink.

NOTE: This command must be run by $AEN_SRVC_ACCT.

/etc/nginx/conf.d/www.enterprise.conf -> /opt/wakari/wakari-server/etc/nginx/conf.d/www.enterprise.conf

SSL certificates (if needed)

Move any SSL certificate files to the locations indicated in the config files.

Restart server

Restart the server application.

NOTE: This command must be run as root or with sudo.

service wakari-server restart

Gateway component steps

Backup

Config files

Create a tar file of all of the configuration files.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -cvf aen_gateway_config.tar -C /opt/wakari/ wakari-gateway/etc/wakari/

Custom .condarc file (if needed)

Make a copy of any /opt/wakari/miniconda/.condarc if it has been modified.

SSL certificates (if needed)

Make a copy of SSL certificate files for the gateway (including the key file) and the certfile for the server (needed for verification if using self-signed or private CA signed certs).

Restore

Reinstall AEN-Gateway

Setting variables and changing permissions

NOTE: These commands must be run by $AEN_SRVC_ACCT.

Run:

export AEN_SERVER=<FQDN HOSTNAME OR IP ADDRESS> # Use the real FQDN
export AEN_GATEWAY_PORT=8089
export AEN_GATEWAY=<FQDN HOSTNAME OR IP ADDRESS>  # will be needed shortly
chmod a+x aen-*.sh                # Set installer to be executable

NOTE: Change <FQDN HOSTNAME OR IP ADDRESS> to the actual fully qualified domain hostname or IP address.

NOTE: You must perform the entire procedure before closing the terminal to ensure the variable export persists. If the terminal is closed before successful installation, export the variables to continue with the installation.

Running the AEN gateway installer

Run:

sudo -E ./aen-gateway-4.3.3-Linux-x86_64.sh -w $AEN_SERVER
<license text>
...
...

PREFIX=/opt/wakari/wakari-gateway
Logging to /tmp/wakari_gateway.log
...
...
Checking server name
Please restart the Gateway after running the following command
to connect this Gateway to the AEN Server
...

Config files

Untar the configuration files.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -xvf aen_gateway_config.tar -C /opt/wakari

Verify that the files are in /opt/wakari/wakari-gateway/etc/wakari/ and are owned by the $AEN_SRVC_ACCT.

Custom .condarc file (if needed)

Move the custom .condarc file to /opt/wakari/miniconda/.condarc.

SSL certificates (if needed)

Move any SSL certificate files to the locations indicated in the config files.

Restart gateway

Restart the gateway application.

NOTE: This command must be run as root or with sudo.

service wakari-gateway restart

Compute component steps

Backup

Config files

Create a tar file of all of the configuration files.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -cvf aen_compute_config.tar -C /opt/wakari/ wakari-compute/etc/wakari

Custom Changes (rare)

Manually backup any custom changes that were applied to the code. One change might be additional files in the skeleton folder:

/opt/wakari/wakari-compute/lib/node_modules/wakari-compute-launcher/skeleton

Create user list

AEN uses POSIX access control lists (ACLs) for project sharing, so the backup must preserve the ACL information. This is done with a script that creates a file named users.lst containing a list of all users that have access to projects on a given compute node. Download and run the script.

NOTE: These commands must be run by $AEN_SRVC_ACCT.

wget https://s3.amazonaws.com/continuum-airgap/misc/wk-compute-get-acl-users.py
chmod 755 wk-compute-get-acl-users.py
./wk-compute-get-acl-users.py

Project files

Create a tar of the projects directory with ACLs enabled. The default projects base location is /projects.

NOTE: This command must be run as root or with sudo.

tar --acls -cpvf projects.tar -C <projects base location>/*

Full Anaconda (option 1)

If any changes have been made to the default Anaconda installation (additional packages installed or packages removed), it is necessary to backup the entire Anaconda installation.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -cvf aen_anaconda.tar -C /opt/wakari/anaconda/*

If no changes have been made to the default installation of Anaconda, you may just backup the .condarc file and any custom environments.

Partial Anaconda (option 2)

Custom .condarc file

Make a copy of /opt/wakari/anaconda/.condarc.

Custom environments (if needed)

Create a tar file of any custom shared environments.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -cvf aen_compute_envs.tar -C /opt/wakari/ anaconda/envs

NOTE: If no custom shared environments have been created, the envs folder will not be present.

Restore

Reinstall AEN-Compute

Setting variables and changing permissions

NOTE: These commands must be run by $AEN_SRVC_ACCT.

Run:

export AEN_SERVER=<FQDN HOSTNAME OR IP ADDRESS> # Use the real FQDN
chmod a+x aen-*.sh                # Set installer to be executable

NOTE: Change <FQDN HOSTNAME OR IP ADDRESS> to the actual fully qualified domain hostname or IP address.

NOTE: You must perform the entire procedure before closing the terminal to ensure the variable export persists.

Running the AEN compute installer

Run:

sudo -E ./aen-compute-4.3.3-Linux-x86_64.sh -w $AEN_SERVER
...
...
PREFIX=/opt/wakari/wakari-compute
Logging to /tmp/wakari_compute.log
Checking server name
...
...
Initial clone of root environment...
Starting Wakari daemons...
installation finished.
Do you wish the installer to prepend the wakari-compute install location
to PATH in your /root/.bashrc ? [yes|no]
[no] >>> yes

Config files

Untar the config files.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -xvf aen_compute_config.tar -C /opt/wakari

NOTE: Verify that they are located in /opt/wakari/wakari-compute/etc/wakari and are owned by the $AEN_SRVC_ACCT.

Custom changes (rare)

Manually restore any custom changes you saved in the backup section. If there are changes in the skeleton directory, these files must be world readable or projects will refuse to start.

Create users

NOTE: Only create users with these instructions if your Linux machine is not bound to LDAP.

In order for the ACLs to be set properly on restore, all users that have permissions to the files must be available on the machine. Ask your system administrator for the proper way to do this for your system, such as using the “useradd” tool. A list of users that are needed was created in the backup process as a file named users.lst.

A process similar to the following useradd example will be suitable for most Linux systems.

NOTE: This command must be run by $AEN_SRVC_ACCT.

xargs -0 -n 1 useradd --user-group < users.lst

Project files

Create the projects directory in the location specified in projectRoot in wk-compute-launcher-config.json.

NOTE: By default this directory is /projects.

Then untar the projects directory with ACLs.

NOTE: This command must be run as root or with sudo:

tar --acls -xpvf projects.tar -C <projects base location>

Full Anaconda (option 1)

If you did a full backup of the full Anaconda installation, untar this file to /opt/wakari/anaconda.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -xvf aen_anaconda.tar -C /opt/wakari

Partial Anaconda (option 2)

Restore the custom .condarc file.

If you did a partial backup of the Anaconda installation, move the copy of the .condarc file to /opt/wakari/anaconda/.condarc.

Custom environments (if needed)

Untar any custom environments that were created to /opt/wakari/anaconda/envs.

NOTE: This command must be run by $AEN_SRVC_ACCT.

tar -xvf aen_compute_envs.tar -C /opt/wakari

Restart compute node

Restart the compute-launcher application.

NOTE: This command must be run as root or with sudo.

service wakari-compute restart