Bash Tips #2 – Splitting Shell Scripts to Improve Readability

Codebase grows over time and so do shell scripts. Those usually start small and serve a single purpose, but new features get added, unforeseen situations are handled, and suddenly shell scripts have hundreds of lines, losing readability in the process. From what I have observed when collaborating with other developers, it is in our nature to copy existing code modifying it where necessary. Shell scripts are no exception. If a script is written in such a way that encourages adding new lines to an existing script, instead of splitting it into multiple files, a developer will do exactly that.

In this article, I would like to show you how splitting shell scripts into smaller pieces improves readability and encourage you to do the same in your scripts.

Let’s define our goals:

Readability first. A script should not be longer than 100 lines, unless absolutely necessary
A script should be written in such a way that encourages using multiple files instead of adding code and complexity to an existing one
As a byproduct of script creation, we would like to have a collection of reusable parts we can use in other projects

As you can see these are very soft requirements and mainly refer to conventions.

Improving readability by sourcing scripts

We can place a part of our script in a separate file and then source it from the main script. These pieces can be made reusable and, thanks to that, used in multiple scripts. Let’s take a look at the following example, where we use our logger setup from the previous article in the series (Bash Tips #1 – Logging in Shell Scripts):

.
├── includes
│   └── logging.sh
└── script-2.1.sh

logging.sh:

#!/bin/bash
LOGFILE="${1:-log.log}"
exec 3>&1 1>"$LOGFILE" 2>&1
trap "echo 'ERROR: An error occurred during execution, check log $LOGFILE for details.' >&3" ERR
trap '{ set +x; } 2>/dev/null; echo -n "[$(date -Is)]  "; set -x' DEBUG

(It is the logger code from the previous article but further enhanced by adding an option to specify the desired log file path, which makes it more reusable)

script-2.1.sh:

#!/bin/bash

source "includes/logging.sh" "script2.log"

echo "creating a temporary directory and some files" >&3
TEMPDIR=$(mktemp -d)
touch $TEMPDIR/testfile{00..09}
touch /not-existing-directory/testfile

that saves us the necessity of copy-pasting the same four lines and allows us to modify our logging setup in a single place. The only requirement is that we distribute our main script along with all its includes.

Improving readability by executing utility scripts

It is important to understand the difference between sourcing a file and executing a file. When sourcing, all code present in the sourced file is executed within the same environment (context). This means that all environmental variables set and read or functions defined and called by the sourced file are shared with our main script. In the case of our logging script, it makes perfect sense to use source as we want to set up the logging for our main script. But in the case of utility scripts that perform certain operations, we would like to avoid having them affect our environment. It would be better to run them in their own separate shell, and we achieve that by execution. I will present it with an example.

Let’s assume that we would like to extend our main script with a simple feature that would print our IP address:

#!/bin/bash
 
source "includes/logging.sh" "script2.log"

interface=$(ip route get 8.8.8.8 | grep -Po '(?<=dev )\w+(?= )')
address=$(ip addr show dev $interface | grep -Po -m 1 '(?<=inet )\d+\.\d+\.\d+\.\d+(?=/)')
 
echo "my IP is: $address" >&3
echo "creating a temporary directory and some files" >&3
TEMPDIR=$(mktemp -d)
touch $TEMPDIR/testfile{00..09}
touch /not-existing-directory/testfile

We should consider printing some kind of error message if our procedure of determining the IP address fails. We should also add a comment explaining what it does, as it is not obvious at a first glance. Adding these would at least double the size of the script. Let’s cut out this feature to a dedicated script and see how our main script changes.

A new file, getIP.sh:

#!/bin/bash
usage() {
    echo "Prints (first) IP address associated with network interface via which internet traffic to ip 8.8.8.8 is routed."
}

interface=$(ip route get 8.8.8.8 | grep -Po '(?<=dev )\w+(?= )')
address=$(ip addr show dev $interface | grep -Po -m 1 '(?<=inet )\d+\.\d+\.\d+\.\d+(?=/)')

if [[ -z "$address" ]]; then
    echo "ERROR: Failed to get the IP" >&2
    exit 1
fi

echo -n "$address"

This script defines a function and two variables. If we were to source it, it could shadow the usage function of our main script, as well as its variables, and it would affect our logging setup. This is a perfect example where executing it makes more sense. Let’s put it in a separate directory, a directory dedicated to storing executable utility scripts. We can even safely put the directory in our $PATH, achieving the following structure:

.
├── includes
│   └── logging.sh
├── script-2.1.sh
└── utils
    └── getIP.sh

Now we can execute the script where necessary:

#!/bin/bash
# script-2.1.sh

source "includes/logging.sh" "script2.log"
PATH=$PATH:$(pwd)/utils


echo "my IP is: $(getIP.sh)" >&3
echo "creating a temporary directory and some files" >&3
TEMPDIR=$(mktemp -d)
touch $TEMPDIR/testfile{00..09}
touch /not-existing-directory/testfile

The readability of the main script improved, despite adding a simple error message and an explanation of how the IP determination procedure works.

Downsides of such an approach

However, there are some downsides to this approach:

Because of relative imports, we expect a user to change the directory before running the script
We have to distribute multiple files with a certain directory structure

There are ways to improve the user experience regarding these points I describe in other articles:

We can check if the current working directory is correct and exit the script while printing a precise message for the user.
The final distributable script can be an archive or a single big script with all files embedded into it.

Inherently, these problems remain, but in my opinion, the advantages outweigh the disadvantages.

Summary

The example above should not be treated as a go-to solution for every script, but rather an encouragement to split the shell script into smaller, reusable parts. We achieved the goals stated at the beginning:

Our main script remains readable and all logic related to determining the IP address and setting up logging has been extracted to separate scripts. The net delta for the main script after the changes is two lines (or three if we count setting the $PATH)
Sticking to the principle of having separate directories and sourcing/executing other files even for a small feature that easily could have been integrated into the main script, encourages others to follow an already established convention of splitting the logic.
Both getIP.sh and logging.sh scripts can be reused in other scripts with little to no modifications, potentially saving our time in the future.

I hope that this simple example has shown you that such an approach can improve our experience when developing and especially when maintaining or working with scripts created by other developers.

Spring Tips #2: Layered JARs with Spring Boot 2.3

Bash Tips #1 – Logging in Shell Scripts

Introduction to automated provisioning and deploym...

Ansible – from tasks to roles