Bash Tips #5 – Parallelism using xargs

Running things in parallel in bash scripts may seem like a difficult task, but thanks to great utility programs that are available out-of-the-box on most GNU/Linux distributions it is not. In this brief article, I would like to focus on using xargs to run operations in parallel in a convenient manner. This is an approach I prefer to use.

What is xargs?

Let’s start by looking at a single line from the xargs manual that describes what it does:

xargs reads items from the standard input, delimited by blanks or 
new‐lines, and executes the command one or more times with any 
initial-arguments followed by items read from standard input.

It might not be obvious at first glance, so here is a further example:

xargs echo args: <<EOC
arg1
arg2
arg3
EOC

There are 3 newline-separated items passed by stdin: arg1, arg2 and arg3. Those get passed as an argument to echo. This is equivalent to the following line: 

echo args: arg1 arg2 arg3

Note that the echo command runs only once.

xargs flags

Next, we have to introduce 2 flags that we will use:

  • -i allows us to run a command for every argument (one argument-one command) and specify other arguments and their order. The way we do that is by placing a placeholder {} wherever we want our items to be:
xargs -i echo args: {} <<EOC
arg1
arg2
arg3
EOC

is equivalent to:

echo args: arg1
echo args: arg2
echo args: arg3

3 commands are run in total.

  • -P n makes xargs use multiple (n) processes at the same time if possible. Commands built from items are sequentially assigned to free processes. The xargs call exits when the last process finishes execution. If n is 0 xargs runs all commands in parallel.

Let’s try it out:

seq 1 10  |  xargs -i -P 0 bash -c 'sleep {} && echo done sleeping for {} seconds'

Thanks to using bash -c we are running a separate bash process for each item. The -c flag passed to bash specifies that the next argument should be treated as a command (script) to be executed. This allows us to use bash constructs such as && with xargs, as they are just a part of a string passed to bash -c and are not evaluated by the shell running xargs.

The execution time of this line is 10 seconds, which is the time of the slowest operation for an item with 10, as all processes are executed in parallel.

More examples

Here is another example:

xargs -i -P 0 scp testfile {}: <<EOC
user@remote-1.example.com
user@remote-2.example.com
user@remote-3.example.com
user@remote-4.example.com
EOC

This allows us to upload the file to multiple remote servers in parallel without using multiple shells.

The same mechanism can be used to run multiple scripts in parallel from a single “wrapper” script:

xargs -i -P 0 bash {} <<EOC
./1-script.sh
./2-script.sh
./3-script.sh
EOC

This would result in the parallel execution of these scripts.

Summary

I have presented how one can utilize xargs to improve the performance of scripts by achieving parallel execution of commands. To use more complex bash syntax bash -c can be used to run each command in a separate shell with the ability to use all bash instructions.

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *