Concepts#

System overview#

The Anaconda Enterprise Notebooks platform consists of 3 main service groups: AEN server, AEN gateway and AEN compute, which are called “nodes”:

  • Server node—The administrative front-end to the system where users login, user accounts are stored, and administrators manage the system.

  • Gateway node(s)—A reverse proxy that authenticates users and directs them to the proper compute node for their project. Users will not notice this node after installation as it automatically routes them.

  • Compute nodes—Where projects are stored and run.

These services can be run on a single machine or distributed across multiple servers.

Organizationally, each AEN installation has exactly 1 server instance and 1 or more gateway instances. Each compute node can only be connected to a single gateway. The collection of compute nodes served by a single gateway is called a data center. You can add data centers to the AEN installation at any time.

EXAMPLE: An AEN deployment with 2 data centers, where 1 gateway has a cluster of 20 physical computers, and the second gateway has 30 virtual machines, must have the following services installed and running:

  • 1 AEN server instance

  • 2 AEN gateway instances

  • 50 AEN compute instances (20 + 30)

Nodes must be configured and maintained separately.

Server node#

The server node controls login, accounts, admin, project creation and management as well as interfacing with the database. It is the main entry point to AEN for all users. The server node handles project setup and ensures that users are sent to the correct project data center.

Since AEN is web-based, it uses the standard HTTP port 80 or HTTPS port 443 on the server.

AEN uses MongoDB for its internal data persistency. It is typically run on the same host as the server but can also be installed on a separate host.

Server nodes use NGINX to handle the user-facing AEN web interface. NGINX acts as a request proxy for the actual server web-process which runs on a high numbered port that only listens on localhost. NGINX is also responsible for static content.

Server is installed in the /opt/wakari/wakari-server directory.

Server processes#

When you view the status of server processes, you may see the processes explained below.

supervisord

details

description

Manage wakari-worker, multiple processes of wk-server.

user

wakari

configuration

/opt/wakari/wakari-server/etc/supervisord.conf

log

/opt/wakari/wakari-server/var/log/supervisord.log

control

service wakari-server

ports

none

wk-server

details

description

Handles user interaction and passing jobs on to the wakari gateway. Access to it is managed by NGINX.

user

wakari

command

/opt/wakari/wakari-server/bin/wk-server

configuration

/opt/wakari/wakari-server/etc/wakari/

control

service wakari-server

logs

/opt/wakari/wakari-server/var/log/wakari/server.log

ports

Not used in versions after 4.1.2 *

* AEN 4.1.2 and earlier use port 5000. This port is used only on localhost. Later versions of AEN use Unix sockets instead. The Unix socket path is: unix:/opt/wakari/wakari-server/var/run/wakari-server.sock

wakari-worker

details

description

Asynchronously executes tasks from wk-server.

user

wakari

logs

/opt/wakari/wakari-server/var/log/wakari/worker.log

control

service wakari-server

nginx

details

description

Serves static files and acts as proxy for all other requests passed to wk-server process. *

user

nginx

configuration

/etc/nginx/nginx.conf /opt/wakari/wakari-server/etc/conf.d/www.enterprise.conf

logs

/var/log/nginx/woc.log /var/log/nginx/woc-error.log

control

service nginx status

port

80

* In AEN 4.1.2 and earlier the wk-server process runs on port 5000 on localhost only. In later versions of AEN the wk-server process uses the Unix socket path unix:/opt/wakari/wakari-server/var/run/wakari-server.sock.

NGINX runs at least two processes:

  • Master process running as root user.

  • Worker processes running as nginx user.

Gateway node#

The gateway node serves as an access point for a given group of compute nodes. It acts as a proxy service and manages the authorization and mapping of URLs and ports to services that are running on those nodes. The gateway nodes provide a consistent uniform interface for the user.

NOTE: The gateway may also be referred to as a data center because it serves as the proxy for a collection of compute nodes.

You can put a gateway in each data center in a tiered scale-out fashion.

AEN gateway is installed in the /opt/wakari/wakari-gateway directory.

Gateway processes#

When you view the status of server processes, you may see the processes explained below.

supervisord

details

description

Manages the wk-gateway process.

user

wakari

configuration

/opt/wakari/wakari-gateway/etc/supervisord.conf

log

/opt/wakari/wakari-gateway/var/log/supervisord.log

control

service wakari-gateway

ports

none

wakari-gateway

details

description

Passes requests from the AEN Server to the Compute nodes.

user

wakari

configuration

/opt/wakari/wakari-gateway/etc/wakari/wk-gateway-config.json

logs

/opt/wakari/wakari-gateway/var/log/wakari/gateway.application.log /opt/wakari/wakari-gateway/var/log/wakari/gateway.log

working dir

/ (root)

port

8089 (webcache)

Compute node(s)#

Compute nodes are where applications such as Jupyter Notebook and Workbench actually run. They are also the hosts that a user sees when using the Terminal app or when using SSH to access a node. Compute nodes contain all user-visible programs.

Compute nodes only need to communicate with a gateway, so they can be completely isolated by a firewall.

Each project is associated with one or more compute nodes that are part of a single data center.

AEN compute nodes are installed in the /opt/wakari/wakari-compute directory.

Each compute node in the AEN system requires a compute launcher service to mediate access to the server and gateway.

Compute processes#

When you view the status of server processes, you may see the processes explained below.

supervisord

details

description

Manages the wk-compute process.

user

wakari

configuration

/opt/wakari/wakari-compute/etc/supervisord.conf

log

/opt/wakari/wakari-compute/var/log/supervisord.log

control

service wakari-compute

working dir

/opt/wakari/wakari-compute/etc

ports

none

wk-compute

details

description

Launches compute processes.

user

wakari

configuration

/opt/wakari/wakari-compute/etc/wakari/wk-compute-launcher-config.json /opt/wakari/wakari-compute/etc/wakari/scripts/config.json

logs

/opt/wakari/wakari-compute/var/log/wakari/compute-launcher.application.log /opt/wakari/wakari-compute/var/log/wakari/compute-launcher.log

working dir

/ (root)

control

service wakari-compute

port

5002 (rfe)

Wk-compute loads each of the following configuration files, in this order:

  • /etc/wakari/config.json.

  • /etc/wakari/compute-launcher-config.json.

  • ./compute-launcher-config.json.

  • Any configuration file specified by the -c option.

If an option is specified in multiple files, the last one encountered takes precedence.

Supervisor and supervisord#

AEN uses a process control system called “Supervisor” to run its services. Supervisor is run by the AEN Service Account user, usually wakari or aen_admin.

The Supervisor daemon process is called “supervisord”. It runs in the background and should rarely need to be restarted.

Service Account#

AEN must be installed and executed by a Linux account called the AEN Service Account. The username of the AEN Service Account is called the AEN Functional ID (NFI). The AEN Service Account is created during AEN installation—if it does not exist—and is used to run all AEN services.

The default NFI username is wakari. Another popular choice is aen_admin.

WARNING: The Service Account should only be used for administrative tasks, and should not be used for operating AEN the way an ordinary user would. If the Service Account creates or starts projects, the permissions on the AEN package cache will be reset to match the Service Account, which will interfere with the normal operation of AEN for all other users.

Anaconda environments#

Each project has an associated conda environment containing the packages needed for that project. When a project is first started, AEN clones a default environment with the name “default” into the project directory.

Each release of AEN 4 includes specific tested versions of conda and the conda packages included with AEN. These tested conda packages include Python, R, and other packages, and these tested conda packages include all of the packages in Anaconda.

If you upgrade or install different versions of conda or different versions of any of these conda packages, the new packages will not have been tested as part of the AEN 4 release.

These different packages will usually work, especially if they are newer versions, but they are not tested or guaranteed to work, and in some cases they may break product functionality.

You can use a new conda environment to test a new version of a package before installing it in your existing environments.

If using conda to change the version of a package breaks product functionality, you can use conda to change the version of the package back to the version known to work.

For more information about environments, see Working with environments.

Projects and permissions#

AEN users interact with the system predominantly through projects.

Projects are associated with a single data center within the AEN environment. The team of users includes one owner, which is the user that created the project.

Projects live in the projectRoot folder on the compute node—by default, /projects.

The project directory is created the first time a project is started. The start-project script clones it from /opt/wakari/wakari-compute/lib/node_modules/wakari-compute-launcher/skeleton.

Project directory permissions are:

owner: rwx, user who created the project
group: rwx, group of the owner
other: --x, to allow access to the Public folder
ACL: rwx for any other team members

Files and subdirectories within the project directory have the same permissions as the project directory, except:

  • The public folder and everything in it are open to anyone.

  • Any files hardlinked into the root anaconda environment—/opt/wakari/anaconda—are owned by the root or wakari users.

Project file and directory permissions are maintained by the start-project script. All files and directories in the project will have their permissions set when the project is started, except for files owned by root or the AEN_SRVC_ACCT user—by default, wakari or aen_admin.

The permissions set for files owned by root or the AEN_SRVC_ACCT user are not changed to avoid changing the permissions settings of any linked files in the /opt/wakari/anaconda directory.

CAUTION: Do not start a project as the AEN_SRVC_ACCT user. The permissions system does not correctly manage project files owned by this user.