Using Anaconda with Cloudera CDH
NOTE: This page is superseded, please see https://docs.continuum.io/anaconda-scale/cloudera-cdh
There are two methods of using Anaconda on an existing cluster with Cloudera CDH, Cloudera’s Distribution Including Apache Hadoop: 1) the Anaconda parcel for Cloudera CDH, and 2) Anaconda for cluster management. The instructions below describe how to uninstall the Anaconda parcel on a CDH cluster and transition to Anaconda for cluster management.
If the Anaconda parcel is installed on the CDH cluster, use the following steps to uninstall the parcel. Otherwise, you can skip to the next section.
- From the Cloudera Manager Admin Console, click the Parcels indicator in the top navigation bar.
- Click the
Deactivatebutton to the right of the Anaconda parcel listing.
OKon the Deactivate prompt to deactive the Anaconda parcel and restart Spark and related services.
- Click the arrow to the right of the Anaconda parcel listing and choose
Remove From Hosts, which will prompt with a confirmation dialog.
- The Anaconda parcel has been removed from the cluster nodes.
For more information about managing Cloudera parcels, refer to the Cloudera documentation.
Anaconda for cluster management provides additional functionality, including the ability to manage multiple conda environments and packages (including Python and R) alongside an existing CDH cluster.
Configure the nodes with Anaconda for cluster management using the Bare-metal Cluster Setup instructions.
During this process, you will create a profile and provider that describes the cluster.
Provision the cluster using the following command, replacing
cluster-cdhwith the name of your cluster and
profile-cdhwith the name of your profile:
$ acluster create cluster-cdh -p profile-cdh
You can submit Spark jobs along with the
PYSPARK_PYTHONenvironment variable that refers to the location of Anaconda, for example:
$ PYSPARK_PYTHON=/opt/anaconda/bin/python spark-submit pyspark_script.py