Bash Shell: Arrays, Pro and Con

1. Introduction
2. Arrays
3. Function Alternative
4. Pros and Cons
5. References

See the References below

1 Introduction

This article on bash arrays provides an opportunity to compare use of bash arrays with one alternative, factoring the interface into functions.

While many will be encouraged to add the array to their practice, my recommendation is to take the time to appreciate, if not adopt the pfunctional alternative.

2 Arrays

Array syntax is used:

to assign values to the arrays
to count the size of an array
to fetch members of the parallel arrays

And the programming is quite straight-foward.

# List of logs and who should be notified of issues
logPaths=("api.log" "auth.log" "jenkins.log" "data.log")
logEmails=("jay@email" "emma@email" "jon@email" "sophia@email")

# Look for signs of trouble in each log
for i in ${!logPaths[@]};
do
  log=${logPaths[$i]}
  stakeholder=${logEmails[$i]}
  numErrors=$( tail -n 100 "$log" | grep "ERROR" | wc -l )

  # Warn stakeholders if recently saw > 5 errors
  if [[ "$numErrors" -gt 5 ]];
  then
    emailRecipient="$stakeholder"
    emailSubject="WARNING: ${log} showing unusual levels of errors"
    emailBody="${numErrors} errors found in log ${log}"
    echo "$emailBody" | mailx -s "$emailSubject" "$emailRecipient"
  fi
done

3 Function Alternative

The functional approach burries the syntactic noise. The email recepient is in a variable name, keyed on the distinct part of the logfile name, e.g.:

$ auth_ema=emma@email

and retreived replacing the .log suffix with the remainder of the label _email. The emaRecepient eval echo idiom is necessary to defer the leading (escaped) dollar sign to fetch the value in the name.

With a bit more work the coupling of the log file to an email address is therefore made explicit.

This approach also takes advantage of the bash shell naming convention of alternate names in a general pattern, in this case the common .log suffix. The collections may be nested and appear anywhere in the pattern. For example:

echo {a,b}.{x,y}    # produces a.x a.y b.x b.y

emaRecepient () { eval echo \$${$1%.log}_email; }
pair_log-ema () { eval ${1}_email=$2; }
list_of      () { eval "$1 () { \${*:-echo} ${*:2}; }"; }

list_of logPaths {api,auth,jenkins,data}.log 

pair_log-ema api     jay@email      # make these explicit
pair_log_ema auth    emma@email
pair_log-ema jenkins jon@email
pair_log-ema data    sophia@email

errorThreshold () { echo 5; }
numErrors () { tail -n 100 $1 | grep ERROR | wc -l; }

stakeHolderWarning () 
{
   : args: Error Threshhold, a logPath member
   :
   local erTh=$1; shift
   local nErr=$(numErrors $1)
   :
   [[ $nErr -gt $erTh ]] && {
       :
       : compose and send the error email to 
       : . . . . . . the appropriate mailbox
       :
       echo "$nErr errors found in log: $1" |

       mailx -s "WARNING: Unusual Error Level, $1" $(emaRecepient $1)
    }
}

foreachi stakeHolderWarning $(errorThreshold) $(logPaths)

4 Pros and Cons

I prefer the Functional approach over the Array. While the array approach favors conventional programming wisdom, I defy convention by claiming less syntax is better.

4.1 Array

While the array approach is quite straight-forward, here are some liabilities:

using parallel arrays is a dangerous technique, especially when lists get long.
while it's nice to have the array size available in the syntax, if it's only use is to sequence through the array, the shell provides a ready alternative.
the functional approach, which should requrire an economizing of arguments focuses on the primary iterator, in this case logPaths and the email addresses are recognized as a function of the log name.

4.2 Functional

In the functional approach there are a few instances of what I call "more syntax". i.e. that beyond conventional wisdom:

the flavors of eval
the bash "repeated name" convention, which could (should) be used in the array script.
the "foreachi" function belongs to a "foreach" family:
- foreach – takes a function a list of arguments,
- foreachi – same with function, repeating arg, arg list..
- foreachij – function, two repeating, arg list

Each of these bits of enhanced syntax use are meant to make the code cleaner, reducing syntactic noise.

Also, the list_of function is so general, and powerful, it's what causes me to wonder if I'll ever need to use a bash array.

list_of
The first use of list_of is to return it's names. Here's a demonstration of its power:
```
$ declare -f list_of
list_of () 
{ 
    eval "$1 () { \${*:-echo} ${*:2}; }"
}
$ list_of logPaths {api,auth,jenkins,data}.log
$ logPaths
api.log auth.log jenkins.log data.log
$ logPaths ls -l
ls: api.log: No such file or directory
-rw-r--r--@ 1 applemcg  staff   96 Jun 10 14:37 auth.log
-rw-r--r--@ 1 applemcg  staff  140 Jun 10 14:38 data.log
-rw-r--r--@ 1 applemcg  staff    0 Jun 10 14:47 jenkins.log
```
The file jenkins.log contains the text of commands and the resulting standard output. And shows up as empty in the last command.
- first the body of the list_of function
- next, creating logPaths
- then, the default, routine use: echo the names,
- and as an alternative, with arguments, e.g. ls -l, instead of echoing the names, they are used as the arguments to the long-list request.
This latter feature is what causes me to doubt the need for arrays.
foreachi
```
foreachi () 
{ 
    : date: 2017-05-11;
    report_notargcount 3 $# && return 1;
    for a in ${*:-3};
    do
        $1 $2 $a;
    done
}
```
Notice, the shell parameter substitution: ${*:3} says, in effect return the remainder of the arguments from the third thru the end

The report_notargcount is left as an exercise, here's a hint

Maintenance

Notice with an appropriate functions:

toStake () 
{
    foreachi stakeHolderWarning $(errorThreshold) ${*:-$(logPaths)}
} 
setget () 
{ 
    : ~ name value -- defines NAME function returning VALUE;
    : ~ name -- defines NAME function with no value, but now settable;
    set $1 $(UC $1) $2;
    eval "$1 () { [[ \$# -ge 1 ]] && { setenv $2 \"\$1\"; }; echo \$$2; }";
    [[ $# -gt 2 ]] && { 
	$1 $3
    }
}
setget errorThreshold 5

it now becomes possible to consider separate error thresholds for

$ toStake data           # uses the default 5, while
$ ...                  
$ setget errorThreshold 12
$ toStake auth jenkins   # uses another

Always build so the "constants" are easily converted to variables.

4.3 Conclusion

Functions lift "scripting" to a discipline of programming. And make the application malleable by design.

5 References

http://mcgowans.org/pubs/marty3/commonplace/software/arraysProCon.html
from the online, my commonplace book
it's Red Chapter