Tutorial: Getting Started with the Anaconda Python Environment for Machine Learning

About the Anaconda Distribution Platform

This tutorial will guide you through the steps to setup Anaconda for Python Machine Learning in a Windows environment. If you are new to Anaconda, it is an open source python environment that comes out of the box with a lot of useful stuff for data science and machine learning. First, this includes a Python distribution, so there is no need to install Python separately. In addition, Anaconda has an integrated package manager that provides access to several tools and frameworks used in the field of data science and software engineering, including Spyder, RStudio, Visual Studio Code, and Jupyter Notebooks. Also some packages related to data science are already preinstalled. Furthermore, Anaconda comes with a desktop GUI (see below), which makes it easy to launch applications and manage packages and environments without the use of command-line commands.

Anaconda Distribution Platform
Home Screen of the Anaconda Distribution Platform

Many data scientists who use Anaconda also work a lot with Jupyter notebooks. One reason for this is that Jupyter notebooks are very flexible. For example, they support more than 40 programming languages, including R and Python, and can run in different environments. Furthermore, Jupyter notebooks are web-based and can be set up quickly. They also make it easy to version your code and share it with others. So in general, Jupyter notebooks are a good choice, whether you have experience with machine learning or are just getting started.

Anaconda provides support for several programming languages and tools. In the following, this tutorial will show you how to set up Anaconda to work with Python and Jupyter notebooks.

Choose and Download the right Anaconda Version

You can download the latest version of the Anaconda individual edition from the Anaconda website.

If disk space is an issue, there is is also a basic version of anaconda called Miniconda. Miniconda is basically Anaconda but without the additional packages. However, working with Python without packages can be annoying. Therefore I would always prefer the Anaconda version that comes with all packages pre installed.

The Anaconda download page will give you the choice between Anaconda for Python 2.7 and 3.7. So just a quick note on the Python version and why I decided to focus on Python 3. It wasn’t long ago when many people debated about the question whether version 2 or 3 was the better Python version. However, today, many people will agree that Python 3 has won this battle and is the preferred choice among the data science community. This is also reflected in the fact that many companies have switched to Python 3 in recent years. In the corporate context I have worked exclusively with Python 3 so far. That’s why I have decided to focus my blog posts exclusively on Python 3.

So on the Anaconda website, choose the Anaconda version for Python 3.7. When I wrote this blog, the latest version of the Anaconda individual edition was 4.3.1. After the download has completed, run the Anaconda installer and it will guide you through the installation process.

Install the Anaconda Python Environment

During the installation the installer will give you the choice whether you want to add Anaconda to your PATH environment variable. Leave this option unchecked. Furthermore, the installation gives you the option to register Anaconda as the default Python environment, which I would recommend. This is because it will save you some time later, should you want to work with other tools that also need access to Python.

../../../_images/win-install-options.png
Advanced Installation Options of Anaconda
Anaconda installation completed
Anaconda Installation Completed

Using the Anaconda Python Environment

After the installation, you can start Anaconda. You will then see the Home screen, which provides access to all the different tools and CLIs you will be working with in your data science projects. However, before you can start with your Machine Learning projects, you should make sure that you have the most important packages installed.

Anaconda Home Screen

As mentioned before, Anaconda comes with several Python packages pre-installed. A list of the packages that come preinstalled with Anaconda, can be found on the Anaconda website. Alternatively, you can also type in the CMD “pip list” and it will show you a list of the packages that came preinstalled with Anaconda. However, some packages for Machine Learning are not included and you will have to add them manually. I have put together a short list of packages with which I have worked in my series of Python tutorials and which are not preinstalled in Anaconda:

  • Pandas: fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
  • Pandas_datareader: Remote data access for pandas via APIs.
  • Geopandas: GeoPandas is an open source project to make working with geospatial data in python easier.
  • Tensorflow (Keras): Deep learning library, used for neural networks.
  • Scikit-learn: Different tools and algorithms for predictive data analysis.

I frequently use these packages in my Python tutorial. You can add these machine learning packages to your Anaconda environment by running the following conda install commands from the CMD prompt:

# Tensorflow
conda install tensorflow
# or: pip install Tensorflow

# Pandas
conda install pandas
# or:pip install Pandas

# Scikitlearn
conda install scikit-learn
# or:pip install scikit-learn

#GeoPandas
conda install geopandas
# or:pip install geopandas

# Pandas
pip install pandas_datareader
# or:pip install pandas

With the conda install package command, you can access a cloud-based repository to find and install over 7,500 data science and machine learning packages. To download further packages from the conda repository, simply use “conda install packagename”.

Create a new Jupyter Python Notebook

Next, you will create a new Python Notebook. First, you need to launch JupyterLab from Anaconda home. The jupyter Python environment will launch in a new browser window and you should see the standard folder path. There you have to choose a workspace folder. The folder will contain all the Python code and the resources of your python projects. I personally have located my workspace at C:\Users\Username\My_Jupyter_Workspace.

Anaconda Home: Launch JupyterLab

Finally, you can create the new Python notebook. Simply click the “New” tab and Select Python. And voilá a new window will pop up and you can start to code.

Summary

In this tutorial you have learned how to setup the Anaconda Python Environment for Machine Learning. Now that you have installed and configured you Python environment, you are good to go to start with your projects.

Follow Florian Müller:

Data Scientist & Machine Learning Consultant

Hi, my name is Florian! I am a Zurich-based Data Scientist with a passion for Artificial Intelligence and Machine Learning. After completing my PhD in Business Informatics at the University of Bremen, I started working as a Machine Learning Consultant for the swiss consulting firm ipt. When I'm not working on use cases for our clients, I work on own analytics projects and report on them in this blog.

8 Responses

Leave a Reply