Anaconda is an open-source Python environment that comes out of the box with a lot of cool stuff for data science and machine learning, including Jupyter Notebooks, preinstalled packages, and a helpful package manager. Among data scientists, Anaconda is the most widely used Python environment. This article describes how to set up the Anaconda Python environment for Machine Learning in Windows. We will also go through some of the essential commands, such as managing virtual environments, installing packages, and so on.
About the Anaconda Distribution Platform
Anaconda has become the most popular Python environment for machine learning for a variety of reasons. First of all, Anaconda includes a Python distribution, so there is no need for a separate Python installation. In addition, Anaconda has an integrated package manager that provides access to several tools and frameworks used in data science and software engineering, including Spyder, RStudio, Visual Studio Code, and Jupyter Notebooks. Below is a brief description of these tools:
- Jupyter Notebooks: An open-source web application that offers support for creating and sharing code, equations, visualizations and narrative text.
- Pycharm: A fully integrated python programming environment for professional purposes
- Qt Console: A light-weight terminal application for visualization
- Spyder: Apython environment specifically designed for scientific purposes
- RStudio: An environment for the programming language R
- Visual Studio (VS) Code: An IDE for professional purposes by Microsoft
- Orange: A python environment for data mining and visualization
- Glueviz: An open-source Python library for exploring data relationships
Many essential libraries related to data science are already preinstalled, including NumPy, Pandas, matplotlib, etc. Furthermore, Anaconda comes with a desktop GUI called Anaconda Navigator (see below), making it easy to launch applications and manage packages and environments without using command-line commands.
About Jupyter Notebooks
Most data scientists who use Anaconda also work with Jupyter notebooks. Jupyter notebooks are both powerful and easy-to-use, thus making them a good choice, regardless of whether you have experience with machine learning or are just getting started. Jupyter notebooks support more than 40 programming languages, including R and Python, and can run in different environments, thus making them very flexible. Furthermore, they are web-based and easy to set up. They also make it easy to version your code and share it with others.
Setup the Anaconda Python Environment for Machine Learning
In the following, we will set up Anaconda to work with Python and Jupyter notebooks. Let’s get things started!
Step #1 Choose and Download the Right Anaconda Version
First, you need to download the latest version of the Anaconda individual edition from the Anaconda website. The Anaconda full version comes with all packages pre-installed. If disk space is an issue, you can also use Miniconda, a complete Anaconda environment but without the preinstalled packages.
The Anaconda download page will give you a choice between Anaconda for Python 2.7 and 3.7. Today, most machine learning libraries support Python 3. However, it wasn’t long ago when many people debated whether version 2 or 3 was the better Python version. Today, many people will agree that Python 3 has won this battle and is the preferred choice among the data science community.
At the time of writing this article, the latest version of the Anaconda individual edition is 4.3.1. After the download, you can launch the Anaconda installer, which guides you through the installation process.
Step #2 Install the Anaconda Python Environment
During the installation, you can choose whether you want to add Anaconda to your PATH environment variable. You can leave this option unchecked. Also, the installation asks you to register Anaconda as the default Python environment, which I would recommend because it enables other tools to access the Anaconda Python distributions.
Step #3 Using the Anaconda Python Environment
Once the installation process is complete, you can launch the Anaconda Navigator, which provides access to all the different tools and CLIs you will be working on within your data science projects.
Anaconda comes with several Python packages pre-installed. The Anaconda website provides an overview of these packages. To display a list of the packages in your Anaconda python environment, use the CMD command:
Anaconda installation includes many packages, but some of the commonly used packages in machine learning still require a manual installation. Before starting with your machine learning projects, you should therefore make sure that you have the essential packages installed. In the relataly articles, we will be working with the following non-preinstalled packages:
- Geopandas: GeoPandas is an open source project to make working with geospatial data in python easier.
- Tensorflow (Keras): Deep learning library, used for neural networks.
- Seaborn: A package for creating nice visualizations with lots of customization options.
- Scikit-learn: Different tools and algorithms for predictive data analysis.
You can add these packages to your Anaconda environment by running the following conda install commands from the CMD prompt:
# Tensorflow conda install tensorflow # or: pip install Tensorflow # Scikitlearn pip install sklearn # GeoPandas conda install geopandas # or:pip install geopandas # Pandas Data_Reader conda install pandas-datareader # or:pip install pandas-datareader # Keras pip install keras
With the conda install package command, you can access a cloud-based repository to find and install over 7,500 data science and machine learning packages. To download additional packages from the conda repository, use “conda install package name.”
Step #4 Create a New Python Environment
A key feature of Anaconda is its support for multiple virtual isolated programming environments. Virtual environments allow you to work with specific versions of libraries or Python. This is helpful because, from my experience, putting everything into a single environment leads to compatibility issues sooner or later.
Often the best way to solve compatibility issues is by creating a new environment where you install these specific libraries that you need for your current project. Virtual environments have their own packages and paths. Therefore, you don’t have to worry about the effect of packages on other Python environments.
The preferred way to create and manage environments in Anaconda is by using CMD terminal commands. As shown below, you can launch the CMD prompt from the Anaconda Navigator. Alternatively, there is also a graphical interface for managing environments, but I find its use rather tedious.
Below is a list of essential CMD commands for creating and managing environments in Anaconda:
# Display a list of all environments conda env list # Create a new Environment with a specific Python version conda create -n yourenvname python=x.x anaconda # Create an exact copy of an existing environment conda create --clone py35 --name py35-2 # Update Anaconda conda update conda # Activates the environment, so that all subsequent activities affect this environment source activate yourenvname # Install a new package into a specific environment conda install -n yourenvname [package] # Deactivate an environment source deactivate # Remove an environment including all packages conda remove -n yourenvname -all
For additional commands, you can take a look at this Conda cheat sheet.
Step #5 Create a Jupyter Notebook
Next, we create a new Python Jupyter notebook. You can launch JupyterNotebooks from the Anaconda Navigator. This will launch the Jupyter Python environment in a new browser window. Be aware that the notebook will use the virtual Anaconda environment that is currently active. The standard virtual environment is the “base” environment.
Once you have launched the Jupyter notebook environment, you should see the standard folder path. In the folder path, you can choose a workspace folder that will contain all the Python code and the resources of your python projects. I have located my workspace at C:\Users\Username\My_Jupyter_Workspace.
To create a new Python notebook, click the “New” tab and select Python. A new window will open, and you can start to code.
That’s it. You have brought your Python infrastructure in place and can start coding.
This article has shown how to set up the Anaconda Python Environment for machine learning. We have installed and configured the Anaconda Python environment. You have also learned how to manage virtual environments, install packages, and create new Jupyter notebooks. Now you should have the necessary software in place to start your machine learning projects.
If you still need ideas for your first projects, the following tutorials may offer some inspiration:
- Simple Cluster Analysis using K-Means with Python
- Simple Sentiment Analysis using Naive Bayes and Logistic Regression
- Building a Movie Recommender using Collaborative Filtering in Python
- Getting Started with Image Recognition: Classifying Cats and Dogs using Neural Networks with Python