Data catalogs#

When first approaching data analysis, a blank notebook can be extremely daunting—especially if you’ve never worked with notebooks or created one yourself.

Anaconda provides a catalog of sample data sets to familiarize yourself with running and analyzing data sets in a notebook.

Accessing data catalogs#

  1. To open Anaconda Notebooks, click Notebooks at the top of Anaconda Cloud.

  2. Once Notebooks opens, open a new Launcher by clicking the blue plus + in the top-left corner.

  3. In the Launcher, under Anaconda Data Catalogs, click Explore Catalogs.

The Explore Catalogs page provides pre-populated data sets for you to familiarize yourself with data analysis in a notebook.

Using data catalogs in Anaconda Notebooks#

If you’re new to using notebooks, open the README.ipynb on Anaconda Notebooks for a walkthrough on Anaconda Notebooks, working in a notebook, creating conda environments, and answers to frequently asked questions.

There are a few methods for running the cells in your data catalog:

  • To run a single cell, click the cell to select it, then press the play button at the top of the notebook.

  • An alternative way to run the cell is to select it and press Shift + Enter (return on a Mac).

  • A variety of methods for running cells can be found by clicking Run in the menu bar and selecting an option from the dropdown.

Using data catalogs on your local system#

To access the data catalogs on your local system instead of in Anaconda Notebooks, complete the following steps:

  1. Download Anaconda if you have not done so already.

    Note

    If you are using Miniconda, run pip install anaconda-catalogs[examples] after the following step to install the necessary dependencies.

  2. To install the packages necessary to operate Anaconda’s data catalogs, open Anaconda Prompt (Terminal on macOS/Linux) and run the following command:

    conda install anaconda-cloud::anaconda-catalogs
    
  3. Import Intake by running the following command (and subsequent steps) in a Jupyter Notebook or other Python environment:

    import intake
    
  4. To view a list of available example catalogs, run the following commands:

    examples = intake.open_anaconda_catalog("examples")
    list(examples)
    
  5. Select a particular catalog and see what data sets it contains:

    # Replace <CATALOG> with the catalog name
    cat = examples.<CATALOG>
    list(cat)
    
  6. To retrieve the data in a specific data set from the list generated in the previous step, run the following command:

    # Replace <DATASET> with the dataset name
    df = cat.<DATASET>.read()
    
  7. To display the first five entries of the catalog in a Pandas Dataframe, run the following command:

    df.head()