A downloadable free, open source, high performance, optimized Python and R distribution with 100+ packages plus access to easily installing an additional 620+ popular open source packages for data science including advanced and scientific analytics. It also includes conda, an open source package, dependency and environment manager. Thousands more open source packages can be installed with the conda command. Available for Windows, OS X and Linux, all versions are supported by the community.
Bare-metal cluster
On-site or in-house machines, collections of virtual machines (Vagrant, Docker, etc.), or previously instantiated cloud nodes.
Cloud-based cluster
A cluster that consists of machines in a cloud provider such as Amazon EC2.
A group of computers that work in parallel to perform a single task. Also called “parallel computing” since the compute nodes can perform their operations in parallel.
Client machine
The laptop or workstation that contains Anaconda for cluster management and manages the cluster nodes.
Cluster file
The file that defines the configuration including the location of the head node, compute nodes, and authentication/configuration information.
Compute node
The machines managed by the head node that all work together to complete a single task.
Conda is an open source package management system and environment management system for installing multiple versions of software packages and their dependencies and switching easily between them. It works on Windows, Mac, and Linux. It was created for Python packages, but is able to package and distribute any software. Conda is included in all versions of Anaconda, Anaconda Server, and Miniconda.
Head node
A system configured to act as the intermediary between the cluster and the outside network. Can also be referred to as the master or edge node.
A minimal or “bootstrap” version of Anaconda. Installs only what you need to get conda running, including Python, conda, and its dependencies.
PEM key
Privacy enhanced electronic mail file, originally for email and now a general file format for cryptographic keys. In Anaconda for cluster management, this is used for cloud-based clusters and can be obtained from the cloud provider.
In Anaconda for cluster management, plugins are analytics engines and management services that can be installed on a cluster.
A configuration file that defines how a cluster should be configured. It contains information about the number and types of cluster nodes, plugins, and other settings.
Configuration file that defines settings for cloud or bare metal providers. A provider is referenced by a profile and used to provision resources.