Bash Tips #3 – Templating in Bash Scripts

Software deployment systems such as Ansible or Puppet come with a templating engine to help us create and manage configuration files. Such functionality can be also achieved using bash. It might not be as easy to use or feature-rich, but it is a mechanism that can be useful when you cannot or choose not to use other tools.

Let’s try to achieve the following:

We would like to create a configuration file with values that depend on the state of the server, which implies that we cannot use a file we prepare in advance.
We want to avoid building it line by line, we should use a template file to make it easy to modify and maintain in the future.

Templating using envsubst

We will use the following example configuration file:

server:
    address: HERE GOES OUR IP ADDRESS
    port: 8080

identifier: SERVER HOSTNAME WILL BE USED HERE
cpuid: WE PUT CPU MANUFACTURER ID FROM /PROC/CPUINFO HERE

It’s simplified to a degree where using templating is questionable, but try to imagine that the file has a hundred lines and we want to fill out a dozen values.

We start off with gathering the required information:

export SERVER_IP=$(getIP.sh) # Assume that this script outputs our IP address
export SERVER_IDENTIFIER=$(hostname)
export SERVER_CPU_MANUFACTURER_ID
=$(grep -m 1 vendor_id /proc/cpuinfo | cut -d : -f 2 | xargs)

We are using the export keyword here to set these as environmental variables. We do that so that we can use the envsubst program, which allows us to substitute references to environmental variables with their values:

Let’s create a template of our config configTemplate.yml:

server:
    address: $SERVER_IP
    port: 8080

identifier: $SERVER_IDENTIFIER
cpuid: $SERVER_CPU_MANUFACTURER_ID

We can then use the following script to produce a final configuration file:

#!/bin/bash
PATH=$PATH:$(pwd)/utils
export SERVER_IP=$(getIp.sh)
export SERVER_IDENTIFIER=$(hostname)
export SERVER_CPU_MANUFACTURER_ID
=$(grep -m 1 vendor_id /proc/cpuinfo | cut -d : -f 2 | xargs)
cat templates/configTemplate.yml | envsubst > /tmp/finalConfig.yml

our finalConfig.yml looks good:

server:
    address: 172.30.48.27
    port: 8080

identifier: mz
cpuid: GenuineIntel

Limiting variables being expanded

The previous solution has some limitations:

We have to be very careful about using the $ sign in our templates, if a word begins with $ it is going to be expanded to an empty string.
In the case of some frameworks (e.g. Spring) it’s impossible to use a default mechanism for referencing a property value as an environmental variable, as such would be replaced by the envsubst

We have to improve it. We have to make envsubst work selectively with explicitly stated variables. Fortunately, such functionality comes out of the box. envsubstcan accept a single argument that is a string consisting of a space-separated list of environment variables to be expanded.

Here is our template extended with a property referencing an environmental variable (${USER}) that we would not want to be expanded. We want this property to remain unchanged.

configTemplate.yml:

server:
    address: $SERVER_IP
    port: 8080


identifier: $SERVER_IDENTIFIER
cpuid: $SERVER_CPU_MANUFACTURER_ID
user: ${USER}

And the script (modified to expand selected environmental variables only):

#!/bin/bash
PATH=$PATH:$(pwd)/utils
export SERVER_IP=$(getIp.sh)
export SERVER_IDENTIFIER=$(hostname)
export SERVER_CPU_MANUFACTURER_ID=$(grep -m 1 vendor_id /proc/cpuinfo | cut -d : -f 2 | xargs)
cat templates/configTemplate.yml | envsubst '$SERVER_IP $SERVER_IDENTIFIER $SERVER_CPU_MANUFACTURER_ID' > /tmp/finalConfig.yml

The produced finalConfig.yml looks as intended:

server:
	address: 172.30.48.27
	port: 8080

identifier: mz
cpuid: GenuineIntel
user: ${USER}

Storing required values in a collection

The problem has been solved, but having all variables listed in envsubst call hurts readability and requires unnecessary repetition. This can be improved.

Let’s start by placing all variables to be used in templates in a single collection:

declare -A FACTS
FACTS['SERVER_IP']=$(getIp.sh)
FACTS['SERVER_IDENTIFIER']=$(hostname)
FACTS['SERVER_CPU_MANUFACTURER_ID']=$(grep -m 1 vendor_id /proc/cpuinfo | cut -d : -f 2 | xargs)

FACTS is an associative array, a bash equivalent of a dictionary (string-string map). We can now iterate over all variables stored in the array and export them:

    for key in ${!FACTS[@]}; do
        export $key=${FACTS[$key]}
    done

This is functionally equivalent to the previous script, but it allows us to dynamically construct an argument we will pass to the envsubst using print:

printf '$%s ' "${!FACTS[@]}"

prints:

$SERVER_IP $SERVER_IDENTIFIER $SERVER_CPU_MANUFACTURER_ID

exactly what we need.

Let’s put it all together:

    for key in ${!FACTS[@]}; do
        export $key=${FACTS[$key]}
    done


    cat "$TEMPLATE_PATH" | envsubst "$(printf '$%s ' "${!FACTS[@]}")"

This prints the contents of the file under the path TEMPLATE_PATH with all references to variables stored in the facts array expanded.

We can make it a part of a function that can be later sourced by our scripts:

template() {
    ARRAY_NAME="$1"
    TEMPLATE_PATH="$2"
    OUTPUT_PATH="$3"

    local -n ARRAY="$ARRAY_NAME"
    for key in ${!ARRAY[@]}; do
        export $key=${ARRAY[$key]}
    done

    cat "$TEMPLATE_PATH" | envsubst "$(printf '$%s ' "${!ARRAY[@]}")" | tee "$OUTPUT_PATH"
}

Here we have a bash function that accepts 3 arguments. The first argument is a name of an associative array variable (a string), it uses it for variable expansion. The second argument is a path to a template file. The third argument is the path where the templated file should be saved. Usage of tee here is optional, the only difference is that with tee the contents of the file would be printed on stdout as well as placed in the final file. Our logging setup logs stdout of command so the final contents of the file end up in the log as well.

The final file structure looks like this:

├── includes
│   ├── gatheringFacts.sh
│   ├── logging.sh
│   └── templating.sh
├── script-3.1.sh
├── templates
│   └── configTemplate.yml
└── utils
    └── getIp.sh

The main script is short and a lot of details are hidden in supplementary scripts. There is only one line that holds the logic of the script (template ...). The end result is that we have a single line for the whole templating operation and there is no visual clutter.

script-3.1.sh

#!/bin/bash

PATH=$PATH:$(pwd)/utils
source includes/logging.sh "script3.log"
source includes/gatheringFacts.sh
source includes/templating.sh

template "FACTS" templates/configTemplate.yml /tmp/myconfig.yml

Expanding it further

This approach can be further expanded. If we were to conditionally differentiate our template structure we could just store whole sections of a file in conditionally defined variables:

server:
    address: $SERVER_IP
    port: 8080

identifier: $SERVER_IDENTIFIER
cpuid: $SERVER_CPU_MANUFACTURER_ID
user: ${USER}

$FEATURE_X_SECTION

if [[ $FEATURE_X_ENABLED == "true" ]]; do
    export $FEATURE_X_SECTION="XXX"
else
    export $FEATURE_X_SECTION="YYY"
fi

This may make sense in some simple templates, but I oppose such approach. In my opinion, at this point, we have reached the point where we are reinventing the wheel. In complex scenarios I recommend using a full-fledged templating engine such as Jinja used by Ansible, as they offer much better support for templating and more complex logic.

Summary

I have shown you that by using a few simple shell tools we can perform the generation of configuration files from shell scripts in a convenient manner. It is no replacement for full-fledged server provisioning tools such as Ansible. In some situations, however, a simple shell script that can be run by any Linux user is preferred.

Spring Tips #2: Layered JARs with Spring Boot 2.3

Bash Tips #1 – Logging in Shell Scripts

Introduction to automated provisioning and deploym...

Ansible – from tasks to roles