A Python's Environment

2022-05-06

The python is a snake that lives in rainforests, and other tropical environments. But that is not the sort of python or the sort of environment which we will be talking about today.

The Python which will be the subject of our conversation is the computer language which is in fact not named after the snake, but after the British comedy group responsible for the spam song which gave us the word we now use for unwanted email. And the environment in which that Python lives is 'on a computer'.

So we need to talk about your computer's $PATH, which is basically the list of places your computer looks for a program it has been told to run. Let's see what's in it:

> echo $PATH

/sbin:/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin

Depending on how you've set up your shell config, you might have more than this, but your system default should include all these six folders, which we have separated by colons. Look closer and we see that they are all bin (for 'binary') or sbin ('system binary') folders: in the root / directory, the /usr directory, and the /usr/local directory. You can read the Linux manual entry about the file hierarchy for more information about the logic that informs which directory a file should be in: man hier. Or we can use grep if you just want to see the lines which mention bin: man hier | grep 'bin'.

Now we come to Python. If you can run Python from your command-line, it is because the system can find the executable file in its $PATH. In my case, running python returns an error:

> python

zsh: command not found: python

I have to run python3 to run the system version of Python. If we run which python3 we can see that is there in the /bin directory.

> which python3

/bin/python3

And just as the Linux shell has a $PATH that defines where it looks for programs, so Python has a sys.path that defines where it looks for packages you can import. Let's open the Python REPL and see what's included:

> python3

Python 3.7.2 (default, Dec 31 2018, 14:25:33) [GCC 8.2.0] on linux 
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> print(sys.path)

['', '/usr/lib/python37.zip', '/usr/lib/python3.7', '/usr/lib/python3.7/lib-dynload', '/usr/local/lib/python3.7/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.7/dist-packages']

We can use the functionality provided by Python's os module to see what the other folders include:

>>> import os
>>> os.listdir('/usr/lib/python3.7')

['site.py', 'smtpd.py', '_collections_abc.py', 'tarfile.py', ...]

>>> os.listdir('/usr/lib/python3.7/dist-packages')

['numpy']

We can import modules that are in the folders in Python's sys.path:

>>> import numpy
>>> print(numpy)

<module 'numpy' from '/usr/lib/python3/dist-packages/numpy/__init__.py'>

But if they're not there, we'll get an error:

>>> import pandas

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pandas'

Now we'll talk more another time about NumPy and Pandas and the rest of Python's data analysis toolkit, but for now we can just say that they are examples of 'libraries' of code that are not part of the core 'standard library' installed by default. To make new code available to Python, we just need to put it in one of the folders in sys.path -- and the inclusion of the empty string '' at the start of the list means that any modules in the directory from which you run Python are available. Except if the library you want to add depends on other libraries which need installing it might get slightly more complicated.

The simplest way to properly install Python packages is to use the purpose-made tool Pip ('Pip Installs Packages'). On my Debian OS this doesn't come installed by default, but has to be installed separately: sudo apt install python3-pip. But I would actually recommend that you don't bother, but instead use conda to create and manage strictly limited Python environments that won't interfere with your operating system's version of Python.

Conda can either be installed as 'Anaconda', which comes with the conda command-line tool and 1000+ packages for scientific computing; or as 'Miniconda', which just comes with the conda tool and a minimal list of required dependencies. The official website suggests that if you are new to Python you are better off with 'Anaconda' -- but I completely disagree. If you are new to Python why do you need 1000+ packages for scientific computing? So stick to Miniconda, and then you can always download whatever other packages you want when you actually need them.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/install_miniconda.sh
sh ~/install_miniconda.sh -b -u -p ~/.miniconda3
rm -rf ~/install_miniconda.sh

You probably need to reload your shell, and you should then be able to run conda from the command-line. You may also find that your shell automatically loads Conda's base environment. I find this annoying, so I disable this, by adding auto_activate_base: false to my .condarc configuration file. Now you can create a new Python environment, with say version 3.9:

conda create -n my_env python=3.9

Then activate it like this:

conda activate my_env

And now we can see that conda works its magic by adding the folders of the activated environment, and of conda, to the system $PATH:

> echo $PATH

/home/peterprescott/.miniconda3/envs/my_env/bin:/home/peterprescott/.miniconda3/condabin:/sbin:/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/usr/sbin

Likewise if we open Python, and look at sys.path, we'll see that the folders Python is looking in are in the directory of my_env. Which means that now we can pip install whatever strange and wonderful new Python packages we like, safe in the knowledge that we can remove that Conda environment without interfering with your system's version of Python.

conda remove -n my_env

So now perhaps you'll be able to avoid the situation described here.