Concepts#
System overview#
The Anaconda Enterprise Notebooks platform consists of 3 main service groups: AEN server, AEN gateway and AEN compute, which are called “nodes”:
Server node—The administrative front-end to the system where users login, user accounts are stored, and administrators manage the system.
Gateway node(s)—A reverse proxy that authenticates users and directs them to the proper compute node for their project. Users will not notice this node after installation as it automatically routes them.
Compute nodes—Where projects are stored and run.
These services can be run on a single machine or distributed across multiple servers.
Each AEN installation has exactly 1 server instance and 1 or more gateway instances. Each compute node can only be connected to a single gateway. The collection of compute nodes served by a single gateway is called a data center. You can add data centers to the AEN installation at any time.
EXAMPLE: An AEN deployment with 2 data centers, where 1 gateway has a cluster of 20 physical computers, and the second gateway has 30 virtual machines, must have the following services installed and running:
1 AEN server instance
2 AEN gateway instances
50 AEN compute instances (20 + 30)
Nodes must be configured and maintained separately.
Server node#
The server node controls login, accounts, admin, project creation and management as well as interfacing with the database. It is the main entry point to AEN for all users. The server node handles project setup, and ensures that users are sent to the correct project data center.
Since AEN is web-based, it uses the standard HTTP port 80 or HTTPS port 443 on the server.
AEN uses MongoDB for internal data persistency. It is typically run on the same host as the server, but can also be installed on a separate host.
Server nodes use NGINX to handle the user-facing AEN web interface. NGINX acts as a request proxy for the actual server web process, which runs on a high-numbered port that only listens on localhost. NGINX is also responsible for static content.
AEN server is installed in the /opt/wakari/wakari-server
directory.
Server processes#
When you view the status of server processes, you may see the processes explained below.
supervisord |
details |
---|---|
description |
Manage |
user |
|
configuration |
|
log |
|
control |
|
ports |
none |
wk-server |
details |
---|---|
description |
Handles user interaction and passing jobs on to the wakari gateway. Access to it is managed by NGINX. |
user |
|
command |
|
configuration |
|
control |
|
logs |
|
ports |
Not used in versions after 4.1.2 * |
* AEN 4.1.2 and earlier use port 5000. This port is used only on localhost.
Later versions of AEN use Unix sockets instead. The Unix socket path is
unix:/opt/wakari/wakari-server/var/run/wakari-server.sock
wakari-worker |
details |
---|---|
description |
Asynchronously executes tasks from |
user |
|
logs |
|
control |
|
nginx |
details |
---|---|
description |
Serves static files and acts as proxy for all other requests passed to wk-server process. * |
user |
nginx |
configuration |
|
logs |
|
control |
|
port |
80 |
* In AEN 4.1.2 and earlier the wk-server process runs on port 5000 on
localhost only. In later versions of AEN the wk-server process uses the Unix
socket path unix:/opt/wakari/wakari-server/var/run/wakari-server.sock
.
NGINX runs at least two processes:
Master process running as root user.
Worker processes running as nginx user.
Gateway node#
The gateway node serves as an access point for a given group of compute nodes. It acts as a proxy service, and manages the authorization and mapping of URLs and ports to services that are running on those nodes. The gateway nodes provide a consistent uniform interface for the user.
NOTE: The gateway may also be referred to as a data center because it serves as the proxy for a collection of compute nodes.
You can put a gateway in each data center in a tiered scale-out fashion.
AEN gateway is installed in the /opt/wakari/wakari-gateway
directory.
Gateway processes#
When you view the status of server processes, you may see the processes explained below.
supervisord |
details |
---|---|
description |
Manages the |
user |
|
configuration |
|
log |
|
control |
|
ports |
none |
wakari-gateway |
details |
---|---|
description |
Passes requests from the AEN Server to the Compute nodes. |
user |
|
configuration |
|
logs |
|
working dir |
|
port |
8089 (webcache) |
Compute node(s)#
Compute nodes are where applications such as Jupyter Notebook and Workbench actually run. They are also the hosts that a user sees when using the Terminal app, or when using SSH to access a node. Compute nodes contain all user-visible programs.
Compute nodes only need to communicate with a gateway, so they can be completely isolated by a firewall.
Each project is associated with one or more compute nodes that are part of a single data center.
AEN compute nodes are installed in the
/opt/wakari/wakari-compute
directory.
Each compute node in the AEN system requires a compute launcher service to mediate access to the server and gateway.
Compute processes#
When you view the status of server processes, you may see the processes explained below.
supervisord |
details |
---|---|
description |
Manages the |
user |
|
configuration |
|
log |
|
control |
|
working dir |
|
ports |
none |
wk-compute |
details |
---|---|
description |
Launches compute processes. |
user |
|
configuration |
|
logs |
|
working dir |
|
control |
|
port |
5002 (rfe) |
wk-compute
loads each of the following configuration files, in
this order:
/etc/wakari/config.json
./etc/wakari/compute-launcher-config.json
../compute-launcher-config.json
.Any configuration file specified by the
-c
option.
If an option is specified in multiple files, the last one encountered takes precedence.
Supervisor and supervisord#
AEN uses a process control system called “Supervisor” to run its
services. Supervisor is run by the AEN Service Account user,
usually wakari
or aen_admin
.
The Supervisor daemon process is called supervisord
. It runs
in the background, and should rarely need to be restarted.
Service Account#
AEN must be installed and executed by a Linux account called the AEN Service Account. The username of the AEN Service Account is called the AEN Functional ID (NFI). The AEN Service Account is created during AEN installation—if it does not exist—and is used to run all AEN services.
The default NFI username is wakari
. Another popular choice is
aen_admin
.
WARNING: The Service Account should be used for administrative tasks only, and should not be used for operating AEN the way an ordinary user would. If the Service Account creates or starts projects, the permissions on the AEN package cache will be reset to match the Service Account, which will interfere with the normal operation of AEN for all other users.
Anaconda environments#
Each project has an associated conda environment containing the
packages needed for that project. When a project is first
started, AEN clones a default environment with the name default
into the project directory.
Each release of AEN 4 includes specific tested versions of conda and the conda packages included with AEN. These tested conda packages include Python, R, and other packages, and these tested conda packages include all of the packages in Anaconda.
If you upgrade or install different versions of conda or different versions of any of these conda packages, the new packages will not have been tested as part of the AEN 4 release.
These different packages will usually work, especially if they are newer versions, but they are not tested or guaranteed to work, and in some cases they may break product functionality.
We recommend you use a new conda environment to test a new version of a package, before installing it in your existing environments.
If using conda to change the version of a package breaks product functionality, you can use conda to change the version of the package back to the version known to work.
For more information about environments, see Working with environments.
Projects and permissions#
AEN users interact with the system predominantly through projects.
Projects are associated with a single data center within the AEN environment. The team of users includes one owner, which is the user that created the project.
Projects live in the projectRoot
folder on the compute
node—by default, /projects
.
The project directory is created the first time a project is
started. The start-project
script clones it
from /opt/wakari/wakari-compute/lib/node_modules/wakari-compute-launcher/skeleton
.
Project directory permissions are:
owner: rwx, user who created the project
group: rwx, group of the owner
other: --x, to allow access to the Public folder
ACL: rwx for any other team members
Files and subdirectories within the project directory have the same permissions as the project directory, except:
The public folder and everything in it are open to anyone.
Any files hardlinked into the root anaconda environment—
/opt/wakari/anaconda
—are owned by the root or wakari users.
Project file and directory permissions are maintained by the
start-project
script. All files and directories in the
project will have their permissions set when the project is
started, except for files owned by root or the AEN_SRVC_ACCT
user—by default, wakari
or aen_admin
.
The permissions set for files owned by root or the AEN_SRVC_ACCT
user are not changed to avoid changing the permissions settings
of any linked files in the /opt/wakari/anaconda
directory.
CAUTION: Do not start a project as the AEN_SRVC_ACCT user. The permissions system does not correctly manage project files owned by this user.