Bash Tips #5 – Parallelism using xargs
Running things in parallel in bash scripts may seem like a difficult task, but thanks to great utility programs that are available out-of-the-box on most GNU/Linux distributions it is not. In this brief article, I would like to focus on using xargs
to run operations in parallel in a convenient manner. This is an approach I prefer to use.
What is xargs?
Let’s start by looking at a single line from the xargs
manual that describes what it does:
xargs reads items from the standard input, delimited by blanks or
new‐lines, and executes the command one or more times with any
initial-arguments followed by items read from standard input.
It might not be obvious at first glance, so here is a further example:
xargs echo args: <<EOC
arg1
arg2
arg3
EOC
There are 3 newline-separated items passed by stdin: arg1
, arg2
and arg3
. Those get passed as an argument to echo
. This is equivalent to the following line:
echo args: arg1 arg2 arg3
Note that the echo
command runs only once.
xargs flags
Next, we have to introduce 2 flags that we will use:
-i
allows us to run a command for every argument (one argument-one command) and specify other arguments and their order. The way we do that is by placing a placeholder{}
wherever we want our items to be:
xargs -i echo args: {} <<EOC
arg1
arg2
arg3
EOC
is equivalent to:
echo args: arg1
echo args: arg2
echo args: arg3
3 commands are run in total.
-P n
makesxargs
use multiple (n
) processes at the same time if possible. Commands built from items are sequentially assigned to free processes. Thexargs
call exits when the last process finishes execution. Ifn
is 0xargs
runs all commands in parallel.
Let’s try it out:
seq 1 10 | xargs -i -P 0 bash -c 'sleep {} && echo done sleeping for {} seconds'
Thanks to using bash -c
we are running a separate bash process for each item. The -c
flag passed to bash specifies that the next argument should be treated as a command (script) to be executed. This allows us to use bash constructs such as &&
with xargs
, as they are just a part of a string passed to bash -c
and are not evaluated by the shell running xargs.
The execution time of this line is 10 seconds, which is the time of the slowest operation for an item with 10
, as all processes are executed in parallel.
More examples
Here is another example:
xargs -i -P 0 scp testfile {}: <<EOC
user@remote-1.example.com
user@remote-2.example.com
user@remote-3.example.com
user@remote-4.example.com
EOC
This allows us to upload the file to multiple remote servers in parallel without using multiple shells.
The same mechanism can be used to run multiple scripts in parallel from a single “wrapper” script:
xargs -i -P 0 bash {} <<EOC
./1-script.sh
./2-script.sh
./3-script.sh
EOC
This would result in the parallel execution of these scripts.
Summary
I have presented how one can utilize xargs
to improve the performance of scripts by achieving parallel execution of commands. To use more complex bash syntax bash -c
can be used to run each command in a separate shell with the ability to use all bash instructions.