Getting Started with the Anaconda Python Environment for Machine Learning

Anaconda is a popular open-source Python environment specifically designed for data science and machine learning. It comes with a range of useful features and tools, including Jupyter Notebooks, pre-installed packages, and a powerful package manager. It is the most widely used Python environment among data scientists and machine learning practitioners.

In this article, we will introduce some of the key features of Anaconda and show you how to set up the Anaconda Python environment for machine learning in Microsoft Windows. We will also cover some essential commands, such as managing virtual environments and installing packages, which will help you get started with your machine learning projects. Whether you are new to machine learning or an experienced practitioner, Anaconda is a valuable tool that can help you streamline your workflow and accelerate your progress. So, it is essential to learn how to use it effectively.

Setting up the Anaconda Python environment for data science and machine learning
Setting up the Anaconda Python environment for data science and machine learning

What is the Anaconda Distribution Platform?

For various reasons, Anaconda has become the most popular Python environment for machine learning. First of all, Anaconda includes a Python distribution, so there is no need for a separate Python installation. In addition, Anaconda has an integrated package manager that provides access to several tools and frameworks used in data science and software engineering, including Spyder, RStudio, Visual Studio Code, and Jupyter Notebooks. Below is a brief description of these tools:

  • Jupyter Notebooks: They are open-source web applications that support creating and sharing code, equations, visualizations, and narrative text.
  • Pycharm: A fully integrated python programming environment for professional purposes
  • Qt Console: A light-weight terminal application for visualization
  • Spyder: Apython environment specifically designed for scientific purposes
  • RStudio: An environment for the programming language R
  • Visual Studio (VS) Code: An IDE for professional purposes by Microsoft
  • Orange: A python environment for data mining and visualization
  • Glueviz: An open-source Python library for exploring data relationships

Many essential libraries related to data science are already preinstalled, including NumPy, Pandas, matplotlib, etc. Furthermore, Anaconda comes with a desktop GUI called Anaconda Navigator (see below), making it easy to launch applications and manage packages and environments without using command-line commands.

About Jupyter Notebooks

Most data scientists who use Anaconda also work with Jupyter notebooks. Jupyter notebooks are often used in the field of data science because they provide a convenient and interactive way to work with data, and they make it easy to share your work with others. They are also widely used in education, allowing you to create interactive lectures and exercises.

Jupyter notebooks are interactive documents that contain a mix of code, text, and other media, such as images, equations, and charts. They are commonly used for data exploration, visualization, and machine learning tasks. Jupyter notebooks support more than 40 programming languages, including R and Python, and can run in different environments, thus making them very flexible. Furthermore, they are web-based and easy to set up. They also make it easy to version your code and share it with others.

Jupyter notebooks are composed of cells, which can contain either code or text (using the Markdown formatting language). The code in a cell can be executed by pressing Shift+Enter, and the output of the code will be displayed below the cell. This allows you to develop and test your code iteratively, and to document your work by including explanations and visualizations alongside the code.

Anaconda includes Jupyter as part of the installation. Once you have Anaconda installed, you can launch Jupyter by running the Jupyter lab or Jupiter notebook command in the terminal. This will open the Jupyter web interface in your web browser, from which you can create and open notebooks.

Jupyter notebooks are interactive documents that allow you to mix code, text, and media in a single document, making them a powerful tool for data exploration, visualization, and analysis.
Jupyter notebooks are interactive documents that allow you to mix code, text, and media in a single document, making them a powerful tool for data exploration, visualization, and analysis.

Setup the Anaconda Python Environment for Machine Learning

We will set up Anaconda to work with Python and Jupyter notebooks in the following.

  1. Download Anaconda
  2. Install Anaconda
  3. Starting Anaconda
  4. Create and manage environments
  5. Install additional packages as needed

Each of the steps will be discussed in more detail in the following. Let’s get things started!

Step #1 Choose and Download the Right Anaconda Version

First, download the latest version of the Anaconda individual edition from the Anaconda website. The Anaconda full version comes with all packages preinstalled. If disk space is an issue, you can also use Miniconda, a complete Anaconda environment but without the preinstalled packages.

You will need to select the version of Anaconda that is appropriate for your operating system. The Anaconda download page will choose between Anaconda for Python 2.x and 3.x. Today, most machine learning libraries support Python 3. However, it wasn’t long ago when many people debated whether version 2 or 3 was the better Python version. Many people will agree that Python 3 has won this battle and is the preferred choice among the data science community.

When writing this article, the latest version of the Anaconda individual edition is 4.3.1. After the download, you can launch the Anaconda installer, which guides you through the installation process.

Step #2 Install the Anaconda Python Environment

You can choose whether to add Anaconda to your PATH environment variable during the installation. You can leave this option unchecked. Also, the installation asks you to register Anaconda as the default Python environment, which I recommend because it enables other tools to access the Anaconda Python distributions.

Install Screen During Anaconda Setup
Advanced Installation Options of Anaconda
Install Screen During Anaconda Setup
Anaconda Installation Completed

Step #3 Starting the Anaconda Python Environment

Once the installation process is complete, you can launch the Anaconda Navigator, which provides access to all the tools and CLIs you will be working on within your data science projects.

Anaconda Navigator
Anaconda Navigator of the Anaconda Distribution Platform

Anaconda comes with several Python packages preinstalled. The Anaconda website provides an overview of these packages. To display a list of the packages in your Anaconda python environment, use the CMD command:

pip list

Before starting with your machine learning projects, you should ensure that you have the essential packages installed. Anaconda installation includes many packages, but some of the commonly used packages in machine learning still require a manual installation. In the relataly articles, we will be working with the following non-preinstalled packages:

  • Geopandas: GeoPandas is an open-source project to make working with geospatial data in Python easier.
  • Tensorflow (Keras): Deep learning library used for neural networks.
  • Seaborn: A package for creating nice visualizations with lots of customization options.
  • Scikit-learn: Different tools and algorithms for predictive data analysis.

You can add these packages to your Anaconda environment by running the following conda install commands from the CMD prompt:

# Tensorflow
conda install tensorflow
# or: pip install Tensorflow

# Scikitlearn
pip install sklearn

# GeoPandas
conda install geopandas
# or:pip install geopandas

# Pandas Data_Reader
conda install pandas-datareader
# or:pip install pandas-datareader

# Keras
pip install keras

With the conda install package command, you can access a cloud-based repository to find and install over 7,500 data science and machine learning packages. To download additional packages from the conda repository, use the command: “conda install package name”

Step #4 Create a New Python Environment

A key feature of Anaconda is its support for multiple virtual isolated programming environments. Virtual environments allow you to work with specific versions of libraries or Python. ThPythonhelpful because, from my experience, putting everything into a single environment leads to compatibility issues sooner or later.

Virtual environments have their packages and paths. Therefore, you don’t have to worry about the effect of packages on other Python environments. The best way to solve compatibility issues is by creating a new environment where you install these specific libraries that you need for your current project.

The preferred way to create and manage environments in Anaconda is by using CMD terminal commands. You can launch the CMD prompt from the Anaconda Navigator, as shown below. There is also a graphical interface for managing environments, but I find its use rather tedious.

Anaconda Welcome Screen
The Anaconda Navigator

From the Anaconda Navigator, you can create new environments and install packages. To create a new environment, click on the “Environments” tab, then click the “Create” button. Give your environment a name and select the version of Python that you want to use. You can also select any additional packages that you want to be installed in the environment.

Below is a list of essential CMD commands for creating and managing environments in Anaconda:

# Display a list of all environments
conda env list

# Create a new Environment with a specific Python version
conda create -n yourenvname python=x.x anaconda

# Create an exact copy of an existing environment
conda create --clone py35 --name py35-2

# Update Anaconda
conda update conda

# Activates the environment, so that all subsequent activities affect this environment
source activate yourenvname

# Install a new package into a specific environment
conda install -n yourenvname [package]

# Deactivate an environment
source deactivate

# Remove an environment including all packages
conda remove -n yourenvname -all

For additional commands, you can look at this Conda cheat sheet.

Step #5 Create a Jupyter Notebook

Next, we create a new Python Jupyter notebook. You can launch Jupyter Notebooks from the Anaconda Navigator. The Jupyter Python environment will launch in a new browser window. Be aware that the notebook will use the virtual Anaconda environment that is currently active. The standard virtual environment is the “base” environment.

If you want to create a new environment, you can do this by launching the command prompt and typing the following command:

conda create --name <env name> <possible packages, e.g., keras, numpy, etc.>

Once you have created a new environment, you can activate it with the following command:

conda activate <env name>
Anaconda Navigator: Launch JupyterNotebook
Anaconda Navigator: Launch JupyterNotebook

Once you have launched the Jupyter notebook environment, you should see the standard folder path. In the folder path, you can choose a workspace folder that will contain all the Python code and the resources of your python projects. I have located my workspace at C:\Users\Username\My_Jupyter_Workspace.

To create a new Python notebook, click the “New” tab and select Python. A Pythonndow will open, and you can start to code.

File Management of the Jupyter Python Environment
File Management of the Jupyter Python Environment

That’s it. You have brought your Python infrastructure in place and can start coding.

Summary

This article has provided a comprehensive guide on setting up the Anaconda Python Environment for machine learning projects. By following the steps outlined in this article, you have successfully installed and configured the Anaconda Python environment, which is an essential tool for any data scientist or machine learning engineer.

One of the key takeaways from this article is learning how to manage virtual environments, which is an essential practice for any data scientist or machine learning engineer. By creating separate virtual environments for different projects, you can ensure that each project has the necessary dependencies and libraries without interfering with other projects. This also helps to avoid version conflicts and ensures reproducibility.

Another important aspect covered in this article is package installation. By using Anaconda’s built-in package manager, Conda, you can easily install and manage the necessary packages and libraries for your machine learning projects. Conda also makes it easy to switch between different versions of packages and manage dependencies.

Now that you have your Anaconda Python environment in place, you are ready to tackle exciting machine-learning projects. Image created with Midjourney.

Sources and Further Reading

If you still need ideas for your first projects, the following tutorials may offer some inspiration:

Author

  • Florian Follonier

    Hi, I am Florian, a Zurich-based Cloud Solution Architect for AI and Data. Since the completion of my Ph.D. in 2017, I have been working on the design and implementation of ML use cases in the Swiss financial sector. I started this blog in 2020 with the goal in mind to share my experiences and create a place where you can find key concepts of machine learning and materials that will allow you to kick-start your own Python projects.

    View all posts
0 0 votes
Article Rating
Subscribe
Notify of

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x