OpenSesame
Rapunzel Code Editor
DataMatrix
Support forum
Python Tutorials
MindProbe
Python videos

Files and folders

Files and folders are the building blocks of computer data! Knowing how to programmatically navigate the file system and how to read from and write to files is a crucial skill. So let's learn how to do this!

In this chapter, you will learn

  • Key terms and concepts related to files and folders
  • How to use the Path object
  • How to write to and read from files

Test yourself

  • One interactive mini exercise
  • One review exercise

What are files, folders, and paths?

Computer data is generally organized into files and folders. Files contain the actual data: images, text files, etc. Folders (or directories) are containers that contain files and other (sub)folders.

For example, my home folder (/home/sebastiaan) contains a subfolder called coding, which in turn contains two subfolders called python and r, which in turn contain a number of Python (.py) and R (.R) scripts with exercises. Like this:

/home/sebastiaan/coding/
    python/
        exercice1.py
        exercise2.py
    R/
        exercise1.R
        exercise2.R

Important terms

Before we start, let's define the most important terms related to files and folders. You don't have to remember them right now, but you can refer back to this list when you encounter an unknown term later on.

  • A path is the location of a file or folder.
  • The root is the top-level folder of the path. On Mac OS and Linux, the root is simply /. On Windows, the root is the drive letter (e.g. c:\). (We will return to differences between operating systems below.)
  • The working directory is the currently active folder. When executing a Python script from a code editor, the working directory is usally the folder that contains the script.
  • The home folder is the folder that contains user-specific files. My home folder is /home/sebastiaan/. The home folder is often abbreviated as ~/.
  • An absolute path specifies the location of a file or folder relative to the root. In the example above, the absolute path to exercise1.py is /home/sebastiaan/coding/python/exercise1.py.
  • A relative path specifies the location of a file or folder relative to the working directory. In the example above, and assuming that I'm working in the folder /home/sebastiaan/coding, the relative path to exercise1.py is coding/python/exercise1.py. Note that a relative path doesn't start with a / (i.e. doesn't contain the root).
  • An extension (or suffix) is the part of the path that follows (and includes) the final .. The extension of exercise1.py is .py. Extensions usually indicate the file type; for example, you can recognize Python scripts by the .py extension. Not all files have an extension.
  • A wildcard is a special character that acts as a placeholder in a path. The most common wildcard is the *, which matches anything. For example, the wildcard string *.py matches any path that ends with .py.
  • A glob is a collection of paths that matches a wildcard string. For example, the glob for coding/python/*.py consists of exercise1.py and exercise2.py.

Differences between operating systems

Mac OS and Linux use the (forward) slash (/) as a path separator; that is, the / separates individual folders and files in a path. In contrast, Windows uses the backslash (\) as a path separator; absolute paths in windows also include a drive letter, such as C:\.

Different operating systems also use different naming conventions for home folders: Windows: C:\Users\[name], Linux: /home/[name], Mac OS: /user/[name]. (As you can tell from the name of my home folder, I use Linux.)

Luckily, Python avoids you from having to worry about these differences: no matter with operating system you are using, you can use the \ path separator and ~ (or Path.home()) to refer to the home folder. Python will make sure that these paths are correctly interpreted on all operating systems.

Pro-tip: As you've learned in the Syntax chapter, the backslash also serves as an escape character in Python; that is, the \ allows you to indicate non-printable characters, such as a tab stop (\t) or line break (\n). Therefore, it's easiest to avoid the backslash altogether when working with paths, even on Windows.

The pathlib module and the Path class

The Path class from the pathlib module provides an intuitive way work with files and folders. Let's create a Path object that corresponds to the coding/python/ folder. We use a relative path and assume that my home folder is the current working directory.

from pathlib import Path
py_exercises = Path('coding/python/')
print(py_exercises)

Output:

coding/python

Path objects can be combined using the / operator. Let's use this to create a new Path object that corresponds to exercise1.py:

py_exercise1 = py_exercises / Path('exercise1.py')
print(py_exercise1)

Output:

coding/python/exercise1.py

Path objects have a number of convenient functions and properties:

  • Path.exists() returns True if the path corresponds to an existing file or folder
  • Path.parent returns the parent folder of a file or folder, such as coding/python for for coding/python/exercise1.py
  • Path.is_file() returns True if the path corresponds to an existing file
  • Path.is_dir() returns True if the path corresponds to an existing folder
  • Path.name returns the name without the parent folder, such as exercise1.py for coding/python/exercise1.py
  • Path.stem returns the name without the file extension, such as exercise1 for exercise1.py
  • Path.suffix returns the file extension, such as .py for exercise1.py

Mini exercise

Create a path object that corresponds to the absolute path to exercise1.py and print it out. (Except for the most basic functionality, the Path class does not work in a browser. That's why this chapter only contains this one interactive mini exercise.)

Listing files and folders

The Path.glob() function returns a list of files and subfolders inside a folder. You can specify a wildcard string to indicate which files you want. To list all files, specify the * wildcard string (which matches any path).

print(f'All files in {py_exercises}:')
for path in py_exercises.glob('*'):
    print(f'- {path}')

Output:

All files in coding/python:
- coding/python/exercise2.py
- coding/python/exercise1.py

Creating and removing files and folders

So far, we've worked with Path objects that correspond to files and folders that already exist. However, you can also create Path objects for files and folders that do not (yet) exist. The Path.exists() function allows you to tell whether or not a path exists.

Let's say that we want to create a folder for exercises in the Julia programming language. We first create a Path object for this non-existing folder:

jl_exercises = Path('coding/julia')
print(jl_exercises.exists())

Output:

False

Next, we use the Path.mkdir() function to create an empty folder at this location:

jl_exercises.mkdir()
print(jl_exercises.exists())

Output:

True

Pro-tip: Path.mkdir() by default requires that the parent folder (coding in the example above) already exists. When this is not the case, you can use Path.mkdir(parents=True) to also create all non-existing parent folders.

To create an empty file inside this folder, we use the Path.touch():

jl_exercise1 = jl_exercises / Path('exercise1.jl')
jl_exercise1.touch()
print(jl_exercise1.exists())

Output:

True

Most common operations related to files and folders can be handled by the Path class. Let's consider the most important functions:

  • Path.mkdir() creates a folder (see above)
  • Path.rename() renames a file or folder
  • Path.rmdir() deletes a folder (which cannot be empty, see below)
  • Path.touch() creates an empty file
  • Path.unlink() deletes a file

However, there is one common scenario that the Path class cannot handle: deleting folders that are not empty. For example, if we would try to use Path.rmdir() to delete the coding/julia folder that we created above, then we would get an OSError because the directory isn't empty (it contains exercise1.jl):

jl_exercises.rmdir()

Output:

OSError: [Errno 39] Directory not empty: 'coding/julia'

To recursively delete the folder, that is, to delete the folder and everything inside it, you can use rmtree() from the shutil module instead.

import shutil
shutil.rmtree(jl_exercises)
print(jl_exercises.exists())

Output:

False

Pro-tip: There is a lot of overlap between the functionality of the (newer) Path class and the (older) os, os.path, and shutil modules. For most purposes, using the Path class is recommended because it results in code that is easier to read and write.

Reading and writing files

Text files

py_exercise1 is a Path object that corresponds to the file exercise1.py (see above). To read the contents of this file, we can use Path.read_text():

print(f'The contents of {py_exercise1} are:')
contents = py_exercise1.read_text()
print(contents)

Output:

The contents of coding/python/exercise1.py are:
"""
![](https://upload.wikimedia.org/wikipedia/commons/thumb/d/d2/Pythagorean.svg/390px-Pythagorean.svg.png)

Imagine a right triangle like the one above and:

- Read a number from the standard input and assign it to `a`
- Read another number from the standard input and assign it to `b`
- Use Pythagoras theorem to determine the value of the long side `c`
- Use string formatting to print out the length of the long side
- If `c` is larger than `PI` (a constant), also print out: *And this is longer than PI*
"""

PI = 3.14

a = input('Length of a? ')
b = input('Length of b? ')
a = float(a)
b = float(b)
c = (a ** 2 + b ** 2) ** .5
print('C has length {0}'.format(c))
if c > PI:
  print('And this is longer than PI')

Writing to text files is almost as easy as reading from them, using Path.write_text():

contents = 'Define a factorial function using recursion!'
py_exercise2 = Path('coding/python/exercise2.py')
py_exercise2.write_text(contents)

Pro-tip: Calling Path.write_text() creates a new file and opens it for writing. This means that the file will be overwritten if it already existed. If you want to add text to an existing file without erasing its contents, you need to use a slightly more verbose approach, using a with context and Path.open('a') to indicate that you want to open the file in 'append' mode:

with py_exercise2.open('a') as fd:
    fd.write('\nThis line will be appended to the file')

For the purpose of being able to append text to a file, it is not crucial to understand what a with context is exactly. However, if you want to fully understand the code above, take a look at this page of the Python docs.

Binary (non-text) files

Python makes a distinction between text (str objects) and binary (bytes objects) data. Of course, text also consists of bytes, but there is an additional layer on top of it, namely the character encoding that specifies how bytes should be translated into human-readable text. Binary data does not have this and often does not correspond to human-readable text.

Images are binary files. Say that we want to read a photo of De Boef, which you may meet later on in the deep-learning course chapter on classifying images. We can do this as follows:

img = Path('data/boef.jpg')
contents = img.read_bytes()
print(f'File contents are of type {type(contents)}')

Output:

File contents are of type <class 'bytes'>

To write binary data to a file, simply call Path.write_bytes():

img_copy = Path('copy-of-boef.jpg')
img_copy.write_bytes(contents)

Review exercise

An interactive file browser

You're going to build an interactive file browser! The program starts by listing all files and folders in the current working directory. The first entry in the list is a special entry that corresponds to the parent folder, called '..' (two dots is the standard way to indicate the parent folder); however, this entry is only available if the working directory isn't the root already! Each entry in the list is numbered, starting from 0.

By entering a number, the user can select a file or folder. Depending on what kind of file or folder it is, one of three things can happen:

  • If the user selects a folder or the '..' entry, the working directory is changed to that folder.
  • If the user selects a text file, the content of that file is printed out.
  • If the user selects a binary file, ·the message '[file name] is not a text file' is printed out

Next, the contents of the working directory (which may have changed) are shown again, and the user is prompted to provide new input, and so on, until the user enters quit. If the user enters something other than quit or a number that corresponds to an entry in the list, an informative warning message (such as 'Invalid user input') is printed out.

An interaction could look like this:

Listing contents of /home/sebastiaan/coding
0: ..
1: python

>>> 1
Listing contents of /home/sebastiaan/coding/python
0: ..
1: exercise1.py
2: exercise2.py

>>> quit

Tips:

  • Path.cwd() returns the working directory
  • os.chdir() changes the working directory
  • Calling Path.read_text() on a binary file results in a UnicodeDecodeError

This exercise is not checked automatically, because there are several possible solutions. Click here to see one solution!


This concludes the Python Basics course. Congratulations—you made it to the finish line!