Quickstart¶
This quickstart provides a walkthrough of Anaconda for cluster management using Amazon Web Services (AWS) Elastic Compute Cloud (EC2). The steps covered in this quickstart include defining and launching a cloud-based cluster on Amazon EC2, managing conda packages on the cluster nodes, and installing plugins.
Installation¶
Install Anaconda for cluster management on your local machine following the instructions on the installation page.
Create a provider¶
A sample providers file (shown below) is included with a new installation of
Anaconda for cluster management and is located within the
~/.acluster/providers.yaml
file.
aws_east:
cloud_provider: ec2
keyname: my-private-key
location: us-east-1
private_key: ~/.ssh/my-private-key.pem
secret_id: AKIAXXXXXX
secret_key: XXXXXXXXXX
Edit this file and replace the settings and credentials with your information.
Refer to the Provider settings page for more details about provider settings, including security groups.
You can list the providers with the command:
$ acluster list providers
Create a profile¶
A sample profile is included with a new installation of Anaconda for cluster
management and is located in the ~/.acluster/profiles.d/
directory. The
sample profile named aws_profile_sample
is shown below:
name: aws_profile_sample
provider: aws_east
num_nodes: 4
node_id: ami-d05e75b8 # Ubuntu 14.04, us-east-1 region
node_type: m3.large
user: ubuntu
You can use this profile to create a 4-node cluster based on Ubuntu 14.04.
Refer to the Profile settings page for more details about profile settings.
You can list the profiles with the command:
$ acluster list profiles
Create the cluster¶
After the provider and profile files are defined, you can create a cluster using the command:
$ acluster create demo_cluster --profile aws_profile_sample
This will create your new cluster on Amazon EC2 and provision the cluster nodes, which typically requires between 5 and 10 minutes. You will see updates as the tasks and initialization steps are completed.
Install conda packages¶
Now that you have a cluster running, you can install conda packages using the
acluster conda
command. The acluster
command can be prepended to most
of the conda
commands.
To install numpy, scipy, and pandas on all of the cluster nodes, use the following command:
$ acluster conda install numpy scipy pandas
Note: Refer to the Conda management page for a full list of remote conda commands.
Install plugins¶
Anaconda for cluster management supports multiple plugins such as Apache Spark,
Hadoop Distributed File System (HDFS), the Jupyter Notebook, and more. These
plugins can be installed on the cluster by using the acluster install
command.
For example, the following command can be used to install IPython Notebook on the cluster:
$ acluster install notebook
The notebook will be available on http://{{ HEAD_NODE_IP }}:8888
. You can
open the respective URLs for many of these applications in your browser using
the acluster open
command:
$ acluster open notebook
Run the acluster open
command to view a complete list of supported
applications.
Destroy the cluster¶
When you are finished, the following command can be used to destroy the cluster and terminate all instances in it. It will prompt for confirmation before destroying the cluster.
$ acluster destroy demo_cluster
Further information¶
Refer to the Python with Spark How-tos page for more example use cases for use-cases and example scripts.