Some notes on writing shell scripts

Published on September 12, 2021

I use shell scripts for a number of things:

Automating common tasks, and sets of commands
Wrapping other tools, such as setting up environment
Collecting information
Creating “checklists” ¹

I follow the common touted advice of not using shell scripts for anything too complicated, logic heavy or where strict error handling matters. Sometimes shell scripts do need to be used in these areas, and I shudder thinking how about poor the error handling in shell scripts are. But for cases where you’re just glueing other tools together, and can handle the occasional edge case not working properly, they’re very useful.

My advice isn’t born from a long career writing shell scripts, and certainly isn’t going to be relevant everywhere. If you find a mistake I can be reached at eric AT ericproberts DOT dev.

What shell to use

Almost every script I write starts out with the following line,

#!/usr/bin/env zsh

Yes, in the shell wars I’m a zsh fan. But it’s not just that I prefer using zsh to other shells (frankly, I haven’t tried many), but using something specific allows me to take full advantage of my shell. If I were to start with #!/bin/sh I would be forced to write in a maximally portable way. For instance, I could no longer rely on [[ "$A" = "$B" ]], instead being forced to write the antiquated [ "x$A" = "x$B" ].

Worse, #!/bin/sh doesn’t usually mean what you think it does: that you’re running the original Bourne Shell that will behave the same everywhere. Often this is just bash or zsh in some emulation mode, meaning you are unlikely to get the same behaviour everywhere – hence you need to be careful with portability.

Differences between shells can be subtle, and choosing a single one makes life much easier. I’ll illustrate this by going through examples that you might not catch testing your script.

First, let’s try fixing some permissions. Guess the ... output (and how it differs between shells).

$ ls -al
total 8
drwxr-xr-x 2 eric eric 4096 May 30 15:22 .
d-wx--x--x 4 eric eric 4096 May 30 15:22 ..
-rw-r--r-- 1 eric eric    0 May 30 15:22 a
-rw-r--r-- 1 eric eric    0 May 30 15:22 b
--w------- 1 eric eric    0 May 30 15:22 .c
--w------- 1 eric eric    0 May 30 15:22 .d

$ cat a b .c .d
cat: .c: Permission denied
cat: .d: Permission denied

$ chmod +r -- .*

$ ls -al
...

On zsh this works as I would expect, with the last output being

total 8
drwxr-xr-x 2 eric eric 4096 May 30 15:22 .
d-wx--x--x 4 eric eric 4096 May 30 15:22 ..
-rw-r--r-- 1 eric eric    0 May 30 15:22 a
-rw-r--r-- 1 eric eric    0 May 30 15:22 b
-rw-r--r-- 1 eric eric    0 May 30 15:22 .c
-rw-r--r-- 1 eric eric    0 May 30 15:22 .d

Compare to bash,

total 8
drwxr-xr-x 2 eric eric 4096 May 30 15:22 .
drwxr-xr-x 4 eric eric 4096 May 30 15:22 ..
-rw-r--r-- 1 eric eric    0 May 30 15:22 a
-rw-r--r-- 1 eric eric    0 May 30 15:22 b
-rw-r--r-- 1 eric eric    0 May 30 15:22 .c
-rw-r--r-- 1 eric eric    0 May 30 15:22 .d

If you lack keen eyes look at the parent directory’s permissions. In bash the glob .* expands to . .. .c .d, but zsh leaves out the first two items. We probably meant the zsh behaviour, and I know this difference because it’s bitten me before. (From testing with other shells ksh, which zsh is based, follow’s zsh and sh, which on my Debian machine uses zsh’s emulation mode, follow’s bash’s behaviour).

To be clear, I’m not saying the bash behaviour is wrong, it’s actually more consistent and maybe less surprising than zsh (although I find the zsh behaviour much more useful). But if you’re trying to be portable you might not realize you have to account for this.

Another difference that I find annoying has to do with variable expansion. Like before, try to guess the output,

$ cat 1.sh
./2.sh $1 $2
$ cat 2.sh
echo 1 $1
echo 2 $2
$ $SHELL ./1.sh '/path to file' '/path to file'
...

If $SHELL is bash this will actually print 1 /path and 2 to, while zsh prints what you’d expect (ie, the two paths). I think in both these cases zsh wins out, however in either case with some testing you’d be able to suss out these differences. The point is they’re there, and if you insist on portability you’d need to account for them.

I’ll also throw in here that zsh has a bunch of powerful glob patterns that, as far as I know, don’t have easy equivalent in bash. For instance,

$ echo *(Lm+1)
...list files over 1MB

Error Handling

One of the hardest parts of programming, including shell scripting, is dealing with weird cases and error states. My approach to shell scripting: don’t try too hard. But there’s certainly some low effort things that can make a big difference.

(If you Google around there’s plenty of good boilerplate scripts. A lot of the stuff is overkill I think for the things I use shell scripting for, but it’s often good to know they exist).

The lowest effort error handling is to set some options that give more reasonable behaviour for scripts.

# These lines do the same thing, but both forms are common
set -o nounset -o errexit -o pipefail
set -euo pipefail

Now commands like false, false | true and echo $MISPELLING casue the script to exit.

But this isn’t foolproof. Take the script below,

#!/usr/bin/env zsh
set -euo pipefail
f() { false; echo 1 }
[[ 1 -eq "$(f)" ]] && echo True

The output? True

Generally errexit means false should fail but because of the environment it’s being called in we just move past the false entirely. One thing you can do to beef up command you don’t want to fail is to put || exit 1 after everything.

There’s also other situations that are more difficult to catch,

git ref-parse --git-dir || (
    echo "Not in git repo... creating"
    git init
)

In this case rev-parse is misspelled. A more careful scripter might catch this case.

git ref-parse --git-dir || [[ $? == 128 ]] && (
    echo "Not in git repo... creating"
    git init
)

You’d probably catch it if you misspelled a git subcommand, so handling this case seems extraneous. But the idea you might catch one error thinking it’s another is not uncommon, and consider that some git subcommands like git-switch are still not supported by many widely used git versions.

We catch this by exploiting that git returns a different error code for using the wrong command than not being in a git repository, but again I think that’s overkill.

Should you take tons of strict precautionary measures when writing shell scripts? No, unless you really have to. The number of corner cases and things that can go wrong is high. If that level of error handling becomes necessary, usually python’s an option.

What adds another layer of impossibility to shell error handling is you’re stuck in an environment you can’t control. If you’re writing a zsh script then you can assume the user has zsh. But do they have GNU awk? coreutils? gcc? Are they the right version? These are things you can check, but it’s probably not worth the hassle. You should generally assume there’s a reasonable environment, and handle the cases where you need something special they don’t have. In something like python this isn’t (as much of) a problem: you’re generally working within python where you’re generally not calling out to an external program every other line.

You also shouldn’t be controlling the user’s environment to fix this. If they have a non-standard awk that doesn’t follow the rules in their $PATH that’s on them (it’s up to you to make sure if you need GNU awk to use gawk).

Options Parsing

I like zparseopts, and I recommend you use it. It supports long options and requires less boilerplate than getopts. It’s zsh-specific and IMO the killer feature of writing shell scripts in zsh.

#!/usr/bin/env zsh
set -euo pipefail
zparseopts -D -E -F -a ARGV -help
[[ -n "${ARGV[0]}" ]] && cat <<EOF && exit
Some command

Options:
    --help  Show this message
EOF

echo Hello, World!

For basic argument structures, any option parsing might be overkill (although including --help is always nice). But zparseopts can give you pretty good gnu style options for very little cost.

For instance, let’s say we want to parse -i input_file -o output_file positional_arg,

#!/usr/bin/env zsh
set -euo pipefail

die() { echo "$1" >&2; exit 1 }

zparseopts -D -E -F i:=INPUT_FILE o:=OUTPUT_FILE -help=HELP
[[ -n "$HELP" ]] && echo help text && exit 0
INPUT_FILE="${INPUT_FILE:1}"
[[ -z "$INPUT_FILE" ]] && die 'Input file required'
[[ -z "${1:-}" ]] && die 'Positional required'
: "${OUTPUT_FILE:=/dev/stdout}"

echo "Input File: ${INPUT_FILE:1}"
echo "Output File: $OUTPUT_FILE"
echo "Positional Arg: $1"

With a little work we get pretty robust error handling, including options, overrides and positional elements. You could imagine if you don’t care about mandatory options and defaults you would only need the one zparseopts line.

$ ./command
Input file required
$ ./command -a -b -c
./command:zparseopts:6: bad option: a
$ ./command -ifirst -i override positional
Input File: override
Output File: /dev/stdout
Positional Arg: positional
$ ./command front_positional -i first
Input File: override
Output File: /dev/stdout
Positional Arg: front_positional
$ ./command --help
help text

To understand more about what’s happening above look at the docs.

I’d also recommend searching through the interwebs for how to use getopts. It’s still the standard for parsing but it’s very clunky.

In Conclusion

You can think of a computer program as glue that holds together the different libraries it uses. Recently I wrote that did the following: (1) download the HTML from a webpage (2) locate an m3u8 stream url and (3) download tha stream (itself a series of steps that involves parsing out urls, downloading and stitching them together). The program kept track of it’s progress and stored the result in an sqlite3 file.

You can imagine the program looks something like,

html = httplib.get("https://example.com/stream.html")
stream_url = htmllib.find_selector("video", html).get_attr("src")
downloaded_file = m3u8lib.download_stream(stream_url, to_file=True)
sqlite3lib.run("INSERT INTO downloads(url,filename) VALUES (?,?);", stream_url, downloaded_filed)
print_line("DONE")

As it turns out, all functionality I needed is available as a cli. I could have used curl to download the html, grabbed the m3u8 stream with regex (since the html is simple to parse in this case), gave that to youtube-dl which supports downloading m3u8 streams and I could have given that information to sqlite3 to store.

Basically, just like I used a program, written in python in this case, to stitch a bunch of libraries together, I could have used zsh or bash or a shell to stitch the different programs together.

Why didn’t I? First, I wasn’t confidence enough in the environment this was going to run on. Could I garunteee youtube-dl would be installed? Not without some difficulty. And I wanted to ensure weird states didn’t lead to weird error conditions. For instance,

sqlite3 file.db "INSERT INTO(a,b) VALUES ('$A', '$B');"

is subject to sql injections.² This will only happen in an “error” condition. If everything works as expected the A and B variables should not contain characters that would cause an injection. But if this invariant doesn’t hold true, because of a mistake on my part or a weird environment or anything else, I still want to end up with reasonable behaviour. Saving with the weird characters is reasonable behaviour. SQL injections are not.

There was a likely a time when a program this simple wouldn’t have been worth the effort of compiling. A shell script would’ve been fine in this case. But the python script’s better. My opinion is with the existence of scripting languages like python, anything that requires even moderate error handling or a requirement to act “somewhat reasonably” when unreasonable things happen should avoid shell scripts.

This refers to a technique I initially saw in a blog post titled Do-nothing scripting. ↩︎
SQL injections are generally defined in terms of being an attack, synonymous with SQL injection attack. In this case there’s no untrusted party and it would likely just create a syntax error. ↩︎