Data catalogs#
When first approaching data analysis, a blank notebook can be extremely daunting—especially if you’ve never worked with notebooks or created one yourself.
Anaconda provides a catalog of sample data sets to familiarize yourself with running and analyzing data sets in a notebook.
Accessing data catalogs#
To open Anaconda Notebooks, click Notebooks at the top of Anaconda Cloud.
Once Notebooks opens, open a new Launcher by clicking the blue plus + in the top-left corner.
In the Launcher, under Anaconda Data Catalogs, click Explore Catalogs.
The Explore Catalogs page provides pre-populated data sets for you to familiarize yourself with data analysis in a notebook.
Using data catalogs in Anaconda Notebooks#
If you’re new to using notebooks, open the README.ipynb on Anaconda Notebooks for a walkthrough on Anaconda Notebooks, working in a notebook, creating conda environments, and answers to frequently asked questions.
There are a few methods for running the cells in your data catalog:
To run a single cell, click the cell to select it, then press the play button at the top of the notebook.
An alternative way to run the cell is to select it and press Shift + Enter (return on a Mac).
A variety of methods for running cells can be found by clicking Run in the menu bar and selecting an option from the dropdown.
Using data catalogs on your local system#
To access the data catalogs on your local system instead of in Anaconda Notebooks, complete the following steps:
Download Anaconda if you have not done so already.
Note
If you are using Miniconda, run
pip install anaconda-catalogs[examples]
after the following step to install the necessary dependencies.To install the packages necessary to operate Anaconda’s data catalogs, open Anaconda Prompt (Terminal on macOS/Linux) and run the following command:
conda install anaconda-cloud::anaconda-catalogs
Import Intake by running the following command (and subsequent steps) in a Jupyter Notebook or other Python environment:
import intake
To view a list of available example catalogs, run the following commands:
examples = intake.open_anaconda_catalog("examples") list(examples)
Select a particular catalog and see what data sets it contains:
# Replace <CATALOG> with the catalog name cat = examples.<CATALOG> list(cat)
To retrieve the data in a specific data set from the list generated in the previous step, run the following command:
# Replace <DATASET> with the dataset name df = cat.<DATASET>.read()
To display the first five entries of the catalog in a Pandas Dataframe, run the following command:
df.head()