Troubleshooting#
This troubleshooting guide provides you with ways to deal with issues that may occur with your AEN installation.
General troubleshooting steps#
Clear browser cookies. When you change the AEN configuration or upgrade AEN, cookies remaining in the browser can cause issues. Clearing cookies and logging in again can help to resolve problems.
Make sure that AEN services are set to start at boot, on all nodes.
Make sure that services are running as expected. If any services are not running or are missing, restart them.
Browser error: too many redirects#
Cause#
Browser cookies are out of date.
Solution#
Log out.
Clear the browser’s cookies.
Clear the browser cache.
Log in.
Browser error: too many redirects when starting project apps#
Browser shows “Too many redirects” when the user tries to start an application.
Cause#
The project’s Compute Resource is invalid or was deleted.
Solution#
Exception: exceptions.TypeError: ‘NoneType’ object has no attribute ‘__getitem__’#
This exception appears on the Admin > Exceptions page when a project does not have a Compute Resource assigned.
Cause#
The project’s Compute Resource is invalid or was deleted.
Solution#
Error: unix:////opt/wakari/wakari-server/etc/supervisor.sock no such file#
This is a supervisorctl error.
Cause#
supervisord is not running on the Server.
Solution#
Ensure that supervisord is included in the crontab. Then restart supervisord manually.
Error: “Data Center Not Found” when deleting a project#
Cause#
The data center has been removed.
Solution#
As root, run:
/opt/wakari/wakari-server/bin/wk-server-admin remove-project --db-only <user> <project>
Forgotten administrator password#
Use ssh to log in to the server as root.
Run:
/opt/wakari/wakari-server/bin/wk-server-admin reset-password -u SOME_USER -p SOME_PASSWORD
NOTE: Replace SOME_USER with the administrator username and SOME_PASSWORD with the password.
Log in to AEN as the administrator user with the new password.
Alternatively you may add an administrator user:
Use ssh to log in to the server as root.
Run:
/opt/wakari/wakari-server/bin/wk-server-admin add-user SOME_USER --admin -p SOME_PASSWORD -e YOUR_EMAIL
NOTE: Replace SOME_USER with the username, replace SOME_PASSWORD with the password, and replace YOUR_EMAIL with your email address.
Log in to AEN as the administrator user with the new password.
Log files being deleted#
Log files are being deleted.
NOTE: Locations of AEN log files for each process and application are shown in the node sections in Concepts.
Cause#
AEN installers log in to
/tmp/wakari\_{server,gateway,compute}.log
. If the log files
grow too large, they might be deleted.
Solution#
To set the logs to be more or less verbose, Jupyter Notebooks uses Application.log_level.
To make the logs less verbose than the default, but still informative, set Application.log_level to ERROR.
Error: This socket is closed#
You receive the “This socket is closed” error message when you try to start an application.
Cause#
When the supervisord process is killed, information sent to the
standard output stdout
and the standard error stderr
is
held in a pipe that will eventually fill up.
Once full, attempting to start any application will cause the “This socket is closed” error.
Solution#
To prevent this issue:
Follow the instructions in Managing services to stop and restart processes.
Do not stop or kill supervisord without first stopping wk-compute and any other processes that use it.
To resolve the “This socket is closed” error:
Stop wk-compute by running
sudo kill -9
.Restart the supervisord and wk-compute processes:
sudo /etc/init.d/wakari-compute stop sudo /etc/init.d/wakari-compute start
Service error 502: Cannot connect to the application manager#
Gateway node displays “Service Error 502: Can not connect to the application manager.”
Cause#
A compute node is not responding because the wk-compute process has stopped.
Solution#
Stop and then restart the supervisord and wk-compute processes:
sudo /etc/init.d/wakari-compute stop
sudo /etc/init.d/wakari-compute start
502 communication error on Amazon web services (AWS)#
You receive the “502 Communication Error: This gateway could not communicate with the Wakari server” error message.
Cause#
An AEN gateway cannot communicate with the Wakari server on AWS. There may be an issue with the IP address of the Wakari server.
Solution#
Configure your AEN gateway to use the DNS hostname of the server. On AWS this is the DNS hostname of the Amazon Elastic Compute Cloud (EC2) instance.
Invalid username#
Cause#
The username does not follow 1 or more of these rules:
Must be at least 3 characters and no more than 25 characters.
The first character must be a letter (A-Z) or a digit (0-9).
Other characters can be a letter, digit, period (.), underscore (_) or hyphen (-).
The POSIX standard specifies that these characters are the portable filename character set, and that portable usernames have the same character set.
Solution#
Follow the above rules for usernames.
Notebook Error: Cannot download notebook as PDF via LaTeX#
Cause#
LaTeX is not properly installed.
CentOS/6 Solution#
Install TeXLive from the TUG site. Follow the described steps. The installation may take some time.
Add the installation to the
PATH
in the file/etc/profile.d/latex.sh
. Add the following, replacing the year and architecture as needed:PATH=/usr/local/texlive/2017/bin/x86_64-linux:$PATH
Restart the compute node.
CentOS/7 Solution#
Install the missing packages running the command:
yum install texlive texlive-xetex texlive-xetexconfig texlive-xetex-def texlive-adjustbox texlive-upquote texlive-ulem
Unresponsive wk-server
thread without error messages#
Cause#
Two things can cause the wk-server
thread to freeze without error messages:
LDAP freezing
MongoDB freezing
If LDAP or MongoDB are configured with a long timeout, Gunicorn can time out first and kill the LDAP or MongoDB process. Then the LDAP or MongoDB process dies without logging a timeout error.
Solution#
Check for frozen LDAP or MongoDB server processes.
You may also wish to configure the Gunicorn timeout to more than 30 seconds.
Unresponsive wk-gateway
thread without error messages#
Cause#
If TLS is configured with a passphrase protected private key,
wk-gateway
will freeze without any error messages.
Solution#
Update the TLS configuration so that it does not use a passphrase protected private key.
Error starting projects#
Project’s status page shows “There was an error starting this project”.
Cause#
Lack of disk space in compute nodes prevents projects from starting.
Solution#
Verify that the project node meets the system requirements.
Check if there is enough free space on the compute node’s partition where
/projects
lives:df -h /projects
Free up some disk space to meet the system requirements.
Restart the project.
Changes in .condarc file are ignored#
Changes applied to .condarc
are ignored by conda.
Cause#
Conda loads its configuration by merging multiple files together.
Solution#
Check if you are applying the changes to the correct file.
To show the merged state that conda is currently using:
conda config --show
To show all config files that conda is currently reading:
conda config --show-sources