Troubleshooting (AEN 4.2.0)#

This troubleshooting guide provides you with ways to deal with issues that may occur with your AEN installation.

General troubleshooting steps

  1. Clear browser cookies. When you change the AEN configuration or upgrade AEN, cookies remaining in the browser can cause issues. Clearing cookies and logging in again can help to resolve problems.
  2. Make sure NGINX and MongoDB are running.
  3. Make sure that AEN services are set to start at boot, on all nodes.
  4. Make sure that services are running as expected. If any services are not running or are missing, restart them.
  5. Check for and remove extraneous processes.
  6. Check the connectivity between nodes.
  7. Check the configuration file syntax.
  8. Check file ownership.
  9. Verify that POSIX ACLs are enabled.

Browser error: too many redirects

Cause

Browser cookies are out of date.

Solution

  1. Log out.
  2. Clear the browser’s cookies.
  3. Clear the browser cache.
  4. Log in.

Error: unix:////opt/wakari/wakari-server/etc/supervisor.sock no such file

This is a supervisorctl error.

Cause

supervisord is not running on the Server.

Solution

Ensure that supervisord is included in the crontab. Then restart supervisord manually.

Error: “Data Center Not Found” when deleting a project

Cause

The data center has been removed.

Solution

As root, run:

/opt/wakari/wakari-server/bin/wk-server-admin remove-project --db-only <user> <project>

Forgotten administrator password

  1. Use ssh to log in to the server as root.

  2. Run:

    /opt/wakari/wakari-server/bin/wk-server-admin reset-password -u SOME_USER -p SOME_PASSWORD
    

    NOTE: Replace SOME_USER with the administrator username and SOME_PASSWORD with the password.

  3. Log in to AEN as the administrator user with the new password.

Alternatively you may add an administrator user:

  1. Use ssh to log in to the server as root.

  2. Run:

    /opt/wakari/wakari-server/bin/wk-server-admin add-user SOME_USER --admin -p SOME_PASSWORD -e YOUR_EMAIL
    

    NOTE: Replace SOME_USER with the username, replace SOME_PASSWORD with the password, and replace YOUR_EMAIL with your email address.

  3. Log in to AEN as the administrator user with the new password.

Log files being deleted

Log files are being deleted.

NOTE: Locations of AEN log files for each process and application are shown in the node sections in Concepts.

Cause

AEN installers log in to /tmp/wakari\_{server,gateway,compute}.log. If the log files grow too large, they might be deleted.

Solution

To set the logs to be more or less verbose, Jupyter Notebooks uses Application.log_level.

To make the logs less verbose than the default, but still informative, set Application.log_level to ERROR.

Error: This socket is closed

You receive the “This socket is closed” error message when you try to start an application.

Cause

When the supervisord process is killed, information sent to the standard output stdout and the standard error stderr is held in a pipe that will eventually fill up.

Once full, attempting to start any application will cause the “This socket is closed” error.

Solution

To prevent this issue:

  • Follow the instructions in Managing services to stop and restart processes.
  • Do not stop or kill supervisord without first stopping wk-compute and any other processes that use it.

To resolve the “This socket is closed” error:

  1. Stop wk-compute by running sudo kill -9.

  2. Restart the supervisord and wk-compute processes:

    sudo /etc/init.d/wakari-compute stop
    sudo /etc/init.d/wakari-compute start
    

Service error 502: Cannot connect to the application manager

Gateway node displays “Service Error 502: Can not connect to the application manager.”

Cause

A compute node is not responding because the wk-compute process has stopped.

Solution

Stop and then restart the supervisord and wk-compute processes:

sudo /etc/init.d/wakari-compute stop
sudo /etc/init.d/wakari-compute start

502 communication error on Amazon web services (AWS)

You receive the “502 Communication Error: This gateway could not communicate with the Wakari server” error message.

Cause

An AEN gateway cannot communicate with the Wakari server on AWS. There may be an issue with the IP address of the Wakari server.

Solution

Configure your AEN gateway to use the DNS hostname of the server. On AWS this is the DNS hostname of the Amazon Elastic Compute Cloud (EC2) instance.

Invalid username

Cause

The username does not follow 1 or more of these rules:

  • Must be at least 3 characters and no more than 25 characters.
  • The first character must be a letter (A-Z) or a digit (0-9).
  • Other characters can be a letter, digit, period (.), underscore (_) or hyphen (-).
  • The POSIX standard specifies that these characters are the portable filename character set, and that portable usernames have the same character set.

Solution

Follow the above rules for usernames.

Notebook Error: Cannot download notebook as PDF via LaTeX

Cause

LaTeX is not properly installed.

CentOS/6 Solution

  1. Install TeXLive from the TUG site. Follow the described steps. The installation may take some time.

  2. Add the installation to the PATH in the file /etc/profile.d/latex.sh. Add the following, replacing the year and architecture as needed:

    PATH=/usr/local/texlive/2017/bin/x86_64-linux:$PATH
    
  3. Restart the compute node.

CentOS/7 Solution

  1. Install the missing packages running the command:

    yum install texlive texlive-xetex texlive-xetexconfig texlive-xetex-def texlive-adjustbox texlive-upquote texlive-ulem