Saturday, May 24, 2014

Bash regular expression idosyncracies

As a scripting language, bash can certainly be considered a bit unusual, but it's got the power and the later versions keep adding functionality which makes it easier to get things done. However, even with the latest additions, it's still got its quirks.

Consider the regular expression matching feature (added in version 3)

Given a value
MY_STRING='/usr/local/heroku/bin:/Users/jseidel/bin' 
let's say you want to check for the occurrence of the string 'jseidel/bin'. You might try the following script just to check things out in advance.

#!/bin/bash

export MY_STRING=/usr/local/heroku/bin:/Users/jseidel/bin

if [[ "$MY_STRING" =~ /.*jseidel\/bin.*/ ]]; then
  echo "Found .* version"
fi
if [[ "$MY_STRING" =~ /.*jseidel\/bin.+/ ]]; then
  echo "Found .+ version"
fi
if [[ "$MY_STRING" =~ 'jseidel\/bin' ]]; then
  echo "Found single-quoted version"
fi
if [[ "$MY_STRING" =~ "jseidel\/bin" ]]; then
  echo "Found double-quoted version"
fi
if [[ "$MY_STRING" =~ jseidel\/bin ]]; then
  echo "Found un-quoted version"
fi
What you would find (even in bash 4.3.11) is that only the last one works. 
  1. The first one fails because there are no characters after the 'bin' at the end of the string. This is actually pilot error in that there are no characters following that last 'n' so 'one or more characters' is false.
  2. The second one fails because bash apparently doesn't properly support the '+' operator.
  3. The third and fourth ones fail because bash apparently doesn't properly handle REs in a string. Various docs say the quotes are optional, but that doesn't seem to be the case, either for single- or double-quoted strings.

Took me quite a bit to figure this one out; hope it helps someone else.