Installation Runbook (AEN 4.0)¶
Overview¶
Anaconda Enterprise Notebooks (AEN) is a Python data analysis environment from Continuum Analytics. Accessed through a browser, Anaconda Enterprise Notebooks is a ready-to-use, powerful, fully-configured Python analytics environment.
We believe that programmers, scientists, and analysts should spend their time analyzing data, not working to set up a system. Data should be shareable, and analysis should be repeatable. Reproducibility should extend beyond just code to include the runtime environment, configuration, and input data.
Anaconda Enterprise Notebooks makes it easy to start your analysis immediately.
Audience¶
This runbook walks through the steps needed to install a basic Anaconda Enterprise Notebooks system comprised of the front-end server, gateway, and two compute machines. The runbook is designed for two audiences: those who have direct access to the internet for installation and those where such access is not available or restricted for security reasons. For these restricted a.k.a. “Air Gap” environments, Continuum ships the entire Anaconda product suite on portable storage medium or as a downloadable TAR archive. If you have any questions about the instructions, please contact your sales representative or Priority Support team, if applicable, for additional assistance.
Components¶
AEN Server: The administrative front-end to the system. This is where users login to the system, where user accounts are stored, and where admins can manage the system.
AEN Gateway: The gateway is a reverse proxy that authenticates users and automatically directs them to the proper AEN Compute machine for their project. Users will not notice this component as it automatically routes them. One could put a gateway in each datacenter in a tiered scale-out fashion.
AEN Compute nodes: This is where projects are stored and run. AEN Compute machines only need to be reachable by the AEN Gateway, so they can be completely isolated by a firewall.
Table of Contents¶
- Anaconda Enterprise Installation Run Book
- Overview
- Audience
- Components
- Table of Contents
- Installation Requirements
- Hardware Requirements
- Software Requirements
- Linux System Accounts Required
- Software Prerequisites
- Security Requirements
- Network Requirements
- Other Requirements
- Install Preparation
- Download the Installers
- Gather IP addresses of FQDNs
- Install AEN Server
- AEN Server Preparation - Prerequisites
- Download Prerequisite RPMs
- Install Prerequisite RPMs
- Run the AEN Server Installer
- Setup Variables and Change Permissions
- Run AEN Server Installer
- Start ElasticSearch
- Test the AEN Server Install
- Update the License
- AEN Server Preparation - Prerequisites
- Install AEN Gateway
- Setup Variables and Change Permissions
- Run Wakari Gateway Installer
- Register the AEN Gateway
- Ensure Proper Permissions
- Start the Gateway
- Verify the AEN Gateway has Registered
- Install AEN Compute
- Set Variables and Change Permissions
- Run AEN Compute Installer
- Configure AEN Compute Node
- Configure conda to use local on-site Anaconda Enterprise Repo
- Edit the condarc on the Compute Node
- Configure Anaconda Client
- Optional Configuration
- Configure common AEN Compute options
- Change the project directory
- Create groups with the same id
- Use numeric usernames
- Verify and Tune Search Indexing
- Setting a Default Project Environment
- Configure a remote mongodb
- SELinux Enforcing Mode
- Configure common AEN Compute options
- Wrapping Up
Installation Requirements¶
Hardware Requirements¶
AEN Server
- 2+GB RAM
- 2+CPU cores
- 20GB storage
AEN Gateway
- 2 GB RAM
- 2 CPU cores
AEN Compute (N-machines)
Configure to meet the needs of the projects. At least:
- 2GB RAM
- 2 CPU cores
- At least 20 GB
NOTE: We recommend putting ``/opt/wakari`` and ``/projects`` on the same filesystem. If the project and conda env directories are on separate filesystems then more disk space will be required on compute nodes and performance will be worse.
Software Requirements¶
- RHEL/CentOS versions 6.5 to 6.8 on all nodes (Other operating systems are supported, however this document assumes RHEL or CentOS)
- /opt/wakari: Ability to install here and at least 5GB of storage.
- /projects: Size depends on number and size of projects. At least 20GB of storage.
- ACL: These directories need the filesystem mounted with Posix ACL
support (Posix.1e). Check with
mount
andtune2fs -l /path/to/filesystem | grep options
Linux System Accounts Required¶
Some Linux system accounts (UIDs) are added to the system during installation. If your organization requires special actions, here is the list of UIDs:
- mongod (RHEL) or mongodb (Ubuntu/Debian): created by the RPM or deb package
- elasticsearch: created by RPM or deb package
- nginx: created by RPM or deb package
- wakari: created during installation of Anaconda Enterprise Notebooks
Software Prerequisites¶
AEN Server
- Mongo Version: >= 2.6.8 and < 3.0
- Nginx version: >= 1.4.0
- ElasticSearch: >= 1.7.2
- Oracle JRE 7 and 8
AEN Compute
- git
Security Requirements¶
- root or sudo access
- SELinux in Permissive or Disabled mode - check with
getenforce
Network Requirements¶
- TCP Ports
direction | type | port | protocol | optional | configurable | comments |
inbound | TCP | 80 | HTTP | No | No | Server |
in/out | TCP | 8089 | No | No | Gateway | |
in/out | TCP | 5002 | No | No | Compute |
Other Requirements¶
Assuming the above requirements are met, there are no additional dependencies necessary for AEN.
Note: While not a requirement for running the software, these instructions use curl to download packages used in the install process. You may use other appropriate means to put the needed files into the installation directory.
Installation Preparation¶
Download the Installers¶
Download the installers and copy them to the corresponding servers.
Regular Installation:
curl -O $RPM_CDN/aen-server-4.0.0-Linux-x86_64.sh curl -O $RPM_CDN/aen-gateway-4.0.0-Linux-x86_64.sh curl -O $RPM_CDN/aen-compute-4.0.0-Linux-x86_64.sh
Note: the $RPM_CDN server will be provided by your sales rep.
Gather IP addresses or FQDNs¶
AEN is very sensitive to the IP address or domain name used to connect to the Server and Gateway components. If users will be using the domain name, you should install the components using the domain name instead of the IP addresses. The authentication system requires the proper hostnames when authenticating users between the services.
Fill in the domain names or IP addresses of the components below and record the autogenerated wakari password in the box below after installing the AEN Server component.
Component | Name or IP address | |
---|---|
AEN Server | | |
AEN Gateway | | |
AEN Compute | |
Notes:
- we will refer to the values of these IP entries or DNS entries as,
e.g.,
<AEN_SERVER_IP>
or<AEN_SERVER_FQDN>
, particularly in examples of shell commands. Consider actually assigning those values to environment variables with similar names.
Setup Variables¶
AEN Server Address¶
Define an environment variable for the AEN Server address (FQDN or IP):
export AEN_SERVER=<AEN_SERVER_IP> # <from table above>
Note that the address (FQDN or IP) specified for the AEN server must be resolvable by your intended AEN users web clients. You may verify your hostname as follows:
echo $AEN_SERVER
AEN Functional ID¶
AEN must be installed and executed by a Linux account called the AEN Service Account. The username of the AEN Service Account is called the AEN Functional ID (NFI). The AEN Service Account is created during AEN installation if it does not exist and is used to run all AEN services.
The default NFI username is wakari
. To override that default value and set
the NFI to another name, set the environment variable “AEN_SRVC_ACCT” before
installation:
export AEN_SRVC_ACCT="aen_admin"
This name will then be the username of the AEN Service Account and the username of the AEN Admin account.
AEN Functional Group¶
By default, AEN uses a Linux group (NFG) name of wakari
for all
files and directories owned by the NFI. Set the following environment
variable *before* installation to specify a custom NFG whenever the
NFI environment variable is set: AEN_SRVC_GRP
Example:
export AEN_SRVC_GRP="aen_admin"
AEN Install sudo Command¶
During AEN installation the installers perform various operations that
require root level privileges. By default the installers use the
sudo
command to perform these operations. Set the following
environment variable *before* installation to override the default
sudo
command to perform root level operations or no command at all
when the user running the installers has root privileges and the
sudo
command is not needed or available: AEN_SUDO_INSTALL_CMD
Examples:
export AEN_SUDO_INSTALL_CMD=""
export AEN_SUDO_INSTALL_CMD="sudo2"
AEN sudo Command¶
By default the AEN services uses sudo -u
to perform operations on
behalf of other users. Such operations include mkdir
, chmod
,
cp
and mv
. Set the following environment variable before
installation to override the default sudo
command when sudo is not
available on the system: AEN_SUDO_CMD
.
Note, AEN must have the ability to perform operations on behalf of other
users. This environment variable cannot be set to an empty string or
null. The AEN_SUDO_CMD
must support the -u
command line
parameter similar to the sudo
command.
Example:
export AEN_SUDO_CMD="sudo2"
Note on Post-Install Customization¶
Please review the post-installation documentation for additional information on configuration options.
While root/sudo privileges are required during installation, root/sudo
privileges are not required during normal operations after install, if
user accounts are managed outside the software (for example, via LDAP).
However root/sudo privileges are required to start the services, thus in
the service config files there may still need to be a AEN_SUDO_CMD
entry.
Install AEN Server¶
The AEN server is the administrative frontend to the system. This is where users login to the system, where user accounts are stored, and where admins can manage the system.
AEN Server Preparation - Prerequisites¶
Download Prerequisite RPMs¶
- Regular Installation:
RPM_CDN="https://820451f3d8380952ce65-4cc6343b423784e82fd202bb87cf87cf.ssl.cf1.rackcdn.com"
curl -O $RPM_CDN/nginx-1.6.2-1.el6.ngx.x86_64.rpm
curl -O $RPM_CDN/mongodb-org-tools-2.6.8-1.x86_64.rpm
curl -O $RPM_CDN/mongodb-org-shell-2.6.8-1.x86_64.rpm
curl -O $RPM_CDN/mongodb-org-server-2.6.8-1.x86_64.rpm
curl -O $RPM_CDN/mongodb-org-mongos-2.6.8-1.x86_64.rpm
curl -O $RPM_CDN/mongodb-org-2.6.8-1.x86_64.rpm
curl -O $RPM_CDN/elasticsearch-1.7.2.noarch.rpm
curl -O $RPM_CDN/jre-8u65-linux-x64.rpm
Install Prerequisite RPMs¶
sudo yum install -y *.rpm
sudo /etc/init.d/mongod start
sudo /etc/init.d/elasticsearch stop
sudo chkconfig --add elasticsearch
Run the AEN Server Installer¶
Set Variables and Change Permissions¶
export AEN_SERVER=<FQDN HOSTNAME> # Use the real FQDN
chmod a+x aen-*.sh # Set installer to be executable
Run AEN Server Installer¶
sudo -E ./aen-server-4.0.0-Linux-x86_64.sh -w $AEN_SERVER
<license text>
...
...
PREFIX=/opt/wakari/wakari-server
Logging to /tmp/wakari_server.log
Checking server name
Ready for pre-install steps
Installing miniconda
...
...
Checking server name
Loading config from /opt/wakari/wakari-server/etc/wakari/config.json
Loading config from /opt/wakari/wakari-server/etc/wakari/wk-server-config.json
===================================
Created password '<RANDOM_PASSWORD>' for user 'wakari'
===================================
Starting Wakari daemons...
installation finished.
After successfully completing the installation script, the installer will create the administrator account (AEN_SRVC_ACCT user) and assign it a password:
Created password '<RANDOM_PASSWORD>' for user 'wakari'
Record this password. It will be needed in the following steps. It
is also available in the installation log file found at
/tmp/wakari_server.log
Start ElasticSearch¶
Start elasticsearch to read the new config file
sudo service elasticsearch start
Test the AEN Server install¶
Visit http://$AEN_SERVER. You should be shown the “license expired” page.
Update the License¶
From the “license expired” page, follow the onscreen instructions to upload your license file. After submitting, you should see the login page.
Install AEN Gateway¶
The gateway is a reverse proxy that authenticates users and automatically directs them to the proper AEN Compute machine for their project. Users will not notice this component as it automatically routes them.
Set Variables and Change Permissions¶
export AEN_SERVER=<FQDN HOSTNAME> # Use the real FQDN
export AEN_GATEWAY_PORT=8089
export AEN_GATEWAY=<FQDN HOSTNAME> # will be needed shortly
chmod a+x aen-*.sh # Set installer to be executable
Run Wakari Gateway Installer¶
sudo -E ./aen-gateway-4.0.0-Linux-x86_64.sh -w $AEN_SERVER
<license text>
...
...
PREFIX=/opt/wakari/wakari-gateway
Logging to /tmp/wakari_gateway.log
...
...
Checking server name
Please restart the Gateway after running the following command
to connect this Gateway to the AEN Server
...
Note: replace password with the password of the wakari user that was generated during server installation.
Register the AEN Gateway¶
The AEN Gateway needs to register with the AEN Server. This needs
to be authenticated, so the wakari user’s credentials created during the
AEN Server install need to be used. This needs to be run as sudo or root
to write the configuration file:
/opt/wakari/wakari-gateway/etc/wakari/wk-gateway-config.json
sudo /opt/wakari/wakari-gateway/bin/wk-gateway-configure \
--server http://$AEN_SERVER --host $AEN_GATEWAY \
--port $AEN_GATEWAY_PORT --name Gateway --protocol http \
--summary Gateway --username $AEN_SRVC_ACCT \
--password '<USE PASSWORD SET ABOVE>'
Ensure Proper Permissions¶
sudo chown $AEN_SRVC_ACCT /opt/wakari/wakari-gateway/etc/wakari/wk-gateway-config.json
Start the Gateway¶
sudo service wakari-gateway start
Verify the AEN Gateway has Registered¶
Install AEN Compute¶
This is where projects are stored and run. Adding multiple AEN Compute machines allows one to scale-out horizontally to increase capacity. Projects can be created on individual compute nodes to spread the load.
Set Variables and Change Permissions¶
export AEN_SERVER=<FQDN HOSTNAME> # Use the real FQDN
chmod a+x aen-*.sh # Set installer to be executable
Run AEN Compute Installer¶
sudo -E ./aen-compute-4.0.0-Linux-x86_64.sh -w $AEN_SERVER
...
...
PREFIX=/opt/wakari/wakari-compute
Logging to /tmp/wakari_compute.log
Checking server name
...
...
Initial clone of root environment...
Starting Wakari daemons...
installation finished.
Do you wish the installer to prepend the wakari-compute install location
to PATH in your /root/.bashrc ? [yes|no]
[no] >>> yes
Configure AEN Compute Node¶
Once installed, you need to configure the Compute Launcher on AEN Server.
- Point your browser at the AEN Server
- Login as the AEN_SRVC_ACCT user
- Click on the Admin link in the top navbar
- Click on Enterprise Resources in the left navbar
- Click on Add Resource
- Select the correct (probably the only) Data Center to associate this Compute Node with
- For URL, enter http://$AEN_COMPUTE:5002.
Note: If the Compute Launcher is located on the same box as the Gateway, we recommend using http://localhost:5002 for the URL value.
- Add a Name and Description for the compute node
- Click the Add Resource button to save the changes.
Configure conda to use local on-site Anaconda Enterprise Repo¶
This integrates Anaconda Enterprise Notebooks to use a local on-site Anaconda Enterprise Repository server instead of Anaconda.org.
Edit the condarc on the Compute Node¶
Note: If there are some channels below that you haven’t mirrored, you should remove them from the configuration.
#/opt/wakari/anaconda/.condarc
channels:
- defaults
create_default_packages:
- anaconda-client
- python
- ipython-we
- pip
# Default channels is needed for when users override the system .condarc
# with ~/.condarc. This ensures that "defaults" maps to your Anaconda Server and not
# repo.anaconda.com
default_channels:
- http://<your Anaconda Server name:8080/conda/anaconda
- http://<your Anaconda Server name:8080/conda/wakari
- http://<your Anaconda Server name:8080/conda/anaconda-cluster
- http://<your Anaconda Server name:8080/conda/r-channel
# Note: You must add the "conda" subdirectory to the end
channel_alias: http://<your Anaconda Server name:8080/conda
Configure Anaconda Client¶
Anaconda client lets users work with the Anaconda Repository from the command-line. Things like the following: search for packages, login, upload packages, etc. The command below will set this value globally for all users.
Run the following command filling in the proper value. Requires sudo since config file is written to root file system: /etc/xdg/binstar/config.yaml. This sets the default config for anaconda-client for all users on compute node.
sudo /opt/wakari/anaconda/bin/anaconda config --set url http://<your Anaconda Server>:8080/api -s
Congratulations! You’ve now successfully installed and configured Anaconda Enterprise Notebooks.
Optional Configuration¶
Optional: Configure common AEN Compute options¶
To make any of the changes described below, please edit the following file:
/opt/wakari/wakari-compute/etc/wakari/wk-compute-launcher-config.json
Then restart the AEN Compute service:
sudo service wakari-compute restart
Change the project directory¶
NOTE: We recommend putting ``/opt/wakari`` and ``/projects`` on the same filesystem. If the project and conda env directories are on separate filesystems then more disk space will be required on compute nodes and performance will be worse.
To make aen-compute
service use a different directory than /projects
for storing the projects, modify the configuration file referenced above as follows:
"projectRoot" : "/nfs/storage/services/wakari/projects",
The directory /nfs/storage/services/wakari/projects
specified as
projectRoot
above must exist for this to succeed.
Create groups with the same id¶
Additionally, if the /projects
folder resides on an NFSv3 volume and
you have a setup with several compute nodes, AEN will create local users
with a different uid on each node.
To make the AEN Compute service create groups with the same id, edit the
configuration file referenced above so that it contains the key identicalGID
and the value true
as in the following example. If you don’t see the
identicalGID
key, add it, and notice that you must add a comma at the
beginning of the line.
If you add this line as the last key, remove any comma at the end of the line.
, "identicalGID": true
Use numeric usernames¶
To use numeric usernames, you must modify the configuration file
referenced above so that it contains the key numericUsernames
and the value
true
as in the following example. If you don’t see the numericUsernames
key, add it, and notice that you must add a comma at the beginning of the line.
If you add this line as the last key, remove any comma at the end of the line.
, "numericUsernames": true
Optional: Verify and Tune Search Indexing¶
Verify that the AEN Compute node can communicate with the AEN Server. This is required for search indexing to work correctly.
curl -m 5 $AEN_SERVER > /dev/null
Ensure that there are sufficient inotify
watches available for the
number of subdirectories within the project root filesystem. Some Linux
distributions default to a low number of watches, which may prevent the
search indexer from monitoring project directories for changes.
cat /proc/sys/fs/inotify/max_user_watches
If necessary, this can be increased with the following command:
echo fs.inotify.max_user_watches=100000 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
Ensure that there are sufficient inotify
user instances available,
at least one per project.
cat /proc/sys/fs/inotify/max_user_instances
If necessary, this can be increased with the following command:
echo fs.inotify.max_user_instances=1000 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
Optional: Setting up a Default Project Environment¶
Anaconda Enterprise Notebooks includes a full installation of the
Anaconda python distribution, along with several additional packages,
located in the root
conda environment in the path
/opt/wakari/anaconda
. A copy of this environment is created for each
new AEN Project.
To configure a different set of packages as the defaults, create a new
conda environment in the directory
/opt/wakari/anaconda/envs/default
. For example, to do so using a
python 3.4 base environment, run the following command:
sudo -u $AEN_SRVC_ACCT /opt/wakari/anaconda/bin/conda create -p /opt/wakari/anaconda/envs/default python=3.4
Then use conda
to install any additional packages into the
environment as needed. After creating the environment, clone it once to
ensure that it works correctly:
sudo -u $AEN_SRVC_ACCT /opt/wakari/anaconda/bin/conda create -p /opt/wakari/testenv --clone /opt/wakari/anaconda/envs/default
sudo -u $AEN_SRVC_ACCT rm -rf /opt/wakari/testenv
The default project environment will be cloned into the project
workspace the first time the project is started. To convert an existing
project, run the following command to clone the environment, replacing
/projects/owner/project/envs/<ENV_NAME>
with the path to the new
environment you would like to create within the project:
sudo -u $AEN_SRVC_ACCT /opt/wakari/anaconda/bin/conda create -n /projects/owner/project/envs/<ENV_NAME> --clone /opt/wakari/anaconda/envs/default
Then open the Compute Resource Config for the project and set the project environment path there.
Configure a remote mongodb¶
First you will need to stop the AEN Server, AEN Gateway and AEN compute:
sudo service wakari-server stop
sudo service wakari-gateway stop
sudo service wakari-compute stop
Now, in order to configure a remote database to work with
AEN-Server, you will need to edit
/opt/wakari/wakari-server/etc/wakari/config.json
, create a new key
called MONGO_URL
and as a value you will need to add the database
information. The final file should look like:
{
"MONGO_URL": "mongodb://MONGO-USER:MONGO-PASSWORD@MONGO-URL:MONGO-PORT",
"WAKARI_SERVER": "http://YOUR-IP",
"USE_SES": false,
"CDN": "http://YOUR-UP/static/",
"ANON_USER": "anonymous"
}
You can migrate the data from the former database into the new one, there is a guide about this in the MongoDB documentation website. Once the migration has been performed you can start back the services with:
sudo service wakari-server start
sudo service wakari-gateway start
sudo service wakari-compute start
Optional: SELinux Enforcing Mode¶
In order to run SELinux in Enforcing mode, there are a few ports that
must be set which can be done using the semanage port
command.
The semange command relies on policycoreutils-python
. To install (if
needed):
sudo yum -y install policycoreutils-python
Enable port 5000 for core aen-server:
sudo semanage port -m -t http_port_t -p tcp 5000
The -m
flag is for modifying an existing usage of a port. If you get
an error Port tcp/5000 is not defined
change the flag to -a
to
add the port.
Enable ports 9200 and 9300 for elasticsearch:
sudo semanage port -a -t http_port_t -p tcp 9200
sudo semanage port -a -t http_port_t -p tcp 9300
Please see the Administrative documentation for additional information.
Wrapping Up¶
Congratulations. You now have a fully installed Anaconda Enterprise Notebooks system!
For additional documentation on topics such as creating user accounts and instructions of users who wish to use the system for collaborative analysis, please see other documentation resources.
Should you encounter any issues while installing AEN or have additional questions, please do not hesitate to contact your enterprise support representative.