All about shell scripting here.

Alias in C-shell:

Aliases allow to substitute a long shell command with a simple string. On all of binf machines, all user defined aliases can be found in a file '.alias' in your home directory. The syntax of aliases differs between C shell (csh; like all of our binf machines) and Bourne shell (bash; e.g., ubuntu and fedora). The letter/string following alias is the new simple command. Following are examples of some useful aliases:

Purpose of alias
For csh
For bash
Disk usage of all the folders in a directory
alias du 'du -h --max-depth=1'
alias du='du -h --max-depth=1'
Logging into mutant e.g., m 10
alias m 'ssh -X mutant\!*'
m () { ssh -X mutant"$@"; }
Copy files between machines using tar and ssh e.g., shcp 10 test
(copies folder "/linuxhome/tmp/user/test" from mutant10 to the current directory)
alias shcp 'ssh mutant\!:1 "cd /linuxhome/tmp/user/\!:2 ; tar cf - ./" | tar xvf -'
alias shcp='ssh mutant$1 "cd /linuxhome/tmp/user/$2 ; tar cf - ./" | tar xvf -'

Bash scripts for repetitive tasks:

Disclaimer: If you have a specific task you can always look for it with google of course, and you will find 10 ways of doing something, many of which much better than what I put here.
But this little piece could serve as an intermediate between "how to use an array in bash" and "crazy complicated sysadmin stuff".

Bash scripts are a great way to start many simulations in one go, to do simple tasks on files etc, but sometimes it takes a bit of fiddling to get stuff to work. Basically you can put a number of commands that you'd normally type in the terminal in a file, put the line #!/bin/bash on top, make it executable with chmod +x and you are good to go. However, you can do clever stuff with loops and variables that prevent you from having to do a lot of copy-paste, and you can have Bash read files for you and determine by itself what should be done.
Be careful though: misplaced rm -f commands are also dangerous here, especially when paired with a loop!

simple simulation start script:

#!/bin/bash
# a very simple runscript. I provide different seeds to my program which I store in the arrays STR2 and STR3
STR2=("963" "39" "244" "398" "62" "517" "887" "611" "166" "138")
STR2=("523" "32" "1346" "23476" "446" "2341" "3434" "61342" "5234" "9754")
 
#I only want to start 10 simulations at a time. there are different ways of splitting your runs up, but here I took the lazy route.
for i in ${STR2[*]}; do
   ./my_program -d directory_$i -s $i parfile.cfg &
done
wait # makes sure all the simulations are done before starting the new set.
 
for i in ${STR3[*]}; do
   ./my_program -d directory_$i -s $i parfile.cfg &
done

more complex analysis code with data reading, formatting and plotting

The script below is modified from one I use to analyse many things from my simulations; I do a lot of different tasks. The comments (starting with #) explain what they do.
Note that this script is no longer functional! I stripped a lot of redundant content and not all variables will be initialised.
In short you will find code here for:
  • using variables in a bash script
  • allowing the script user to type in what the value of a variable should be (interactively)
  • reading a file and storing (some of) its content into variables and arrays
  • selecting files and storing (part of) their names
  • looping through arrays with for loops
  • looping with while loops
  • selectively execute commands with if statements
  • parsing file contents and printing data with awk (good for combining files)
  • plotting from the command line and with batchfiles with gnuplot
  • combining pictures into one with montage (an imagemagick tool)
#!/bin/bash
##the top line should always be there. Make script executable with chmod +x
 
# Declare array
declare -a DIRARRAY
declare -a AGENTARRAY
declare -a SUCCESS
 
##########################
#### Data collection #####
##########################
 
## with read, you can ask the user to type in information that ends up in a variable.##
echo "please enter the directory general name and the parfile"
read direcname parfile
echo "Analysing directories $direcname, with parfile $parfile"
 
#read the filename with each seed to analyse"
echo "Now, please enter the file with seeds"
read agfile
#print the contents of the file for checking
echo "The contents of the file: "
cat $agfile
 
## extract the contents of the agfile into the appropriate arrays in Bash ##
mapfile -t FILEARRAY <$agfile     #agfile contains as elements the lines in the file
 
COUNTER=0
#for loop!
for el in "${FILEARRAY[@]}";do
    IFS=' ' read seed <<< "$el" #extract the seed and agent from each line in the file
    DIRARRAY[$COUNTER]=$seed
 
    ((COUNTER++))
done #end of for loop
 
 
#note how, when you set a variable, you just give the variable name.
#However, when you read out its value, you add a $.
COUNTER=0
COUNTER2=0
echo $COUNTER
 
#some normal commands as you'd usually type them.
rm -rf $direcname\_datafiles/
mkdir $direcname\_datafiles/
 
#printf: formatted printing: does not automatically append a newline
printf "">$direcname\_datafiles/nrbands.dat
 
########################
# How I do my analysis #
########################
 
#another loop, looping over every element in DIRARRAY.
#the "all" is denoted by the @.
# here I loop through the directories containing my simulation data
for seed in ${DIRARRAY[@]};do
 
  #This directory contains a number of files starting with "Coded". ls lists them all, head then selects the first of those
  #the $(...) allows me to store this in the variable named FILE.
  FILE=$(ls $direcname\_$seed/FittestGeneration0000009900/Coded* | head -1)
  AGENTID=${FILE:(-10)} #extract the number (which is the last 10 characters of the file name)
  AGENTARRAY[$COUNTER]=$AGENTID #store in this array.
  echo "agent: $AGENTID"
 
  #read some data from a file: was this run a success?
  #first get the filename again
  FILE2=$(ls $direcname\_$seed/FittestGeneration0000009900/FitnessDetails* | head -1)
  # read data from the file into a variable. awk is a mighty handy tool for all kinds of file reading and manipulation
  # the piece outside the brackets is the condition and the piece between brackets is what is executed if the condition is satisfied.
  # NR is the current line, so awk prints the second element of the second line in this case.
  read -r bandnr < <(awk 'NR==2 {printf $2}' $FILE2)
  SUCCESS[$COUNTER]=$bandnr
 
  #append data to a file with >> because > overwrites the file.
  echo $bandnr >>$direcname\_datafiles/nrbands.dat
 
 
  # I only run my analysis program if the run was successful.
  # if statements! Friggin' sensitive fuckers ##
  # note the spaces around [ and ]? Don't forget those.
  # for more info on if statements: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_07_01.html
  if [ "${SUCCESS[$COUNTER]}" -gt "1" ]; then
    ((COUNTER2++))
    rm -rf $direcname\_$seed/analyse$AGENTID
    ## run analysis program ##
    ./shortanalyse $AGENTID $direcname\_$seed/analyse$AGENTID -d $direcname\_$seed/FittestGeneration0000009900/ -s $seed $parfile
 
  fi #end of if statement.
 
  ((COUNTER++))
 
done
 
######################################
#### Data collection and plotting ####
######################################
COUNTER=0
 
# while statement!!
# note how you can either compare vars as "$var" -lt "16" or as $(($var)) -lt 16.
# Both are weird. take your pick, they should both compare the numerical value.
while [ $(($tel)) -lt 16 ]; do
  printf "$tel 0.0 0.0 0.0\n" >>$direcname\_datafiles/degreedistr_original.dat #print some data
  if [ $(($tel)) -lt 7 ]
  then
    printf "$tel 0\n" >>$direcname\_datafiles/loopfreqs_segm.dat
  fi
  ((tel++))
done
 
#some more awk because it is cool
awk '$1=="0" { printf $3 " " }' $some_file >>$some_otherfile  #printf does not append newline at the end
awk '$1=="5" { print $2, $3, $4 }' $some_3rdfile >>$some_otherfile  #print does end with newline
 
: '
more elaborate command line awk usage: f is a variable which can be set in the {} part and checked in the condition.
# the script below checks whether the 8th element on the line equals 2. if so, and nothing has been done yet (f=0), print the first element of that line. now something has been done so f=1.
if the 8th element is >2 and nothing has been done yet, print first 0, then element 1 of that line and element 8, and finish (f=2)
if instead you already printed something (so f=1), then just print the first element, the 8th and finish (f=2).
'
awk '$8==2 && f==0 {printf $1 " "; f=1} $8>2 && f==1 {print $1 " " $8; f=2} $8>2 && f==0 {print "0 " $1 " " $8; f=2}' ${direcname}_$seed/PopBandDynamics >> $direcname\_datafiles/firstband_time.dat
 
#collect data from two files into one, using conditionals
#FNR denotes first file. Store multiple elements of this line in arrays (a[NR]=$2) then go to the next file with next
#then we also go to the next {} block: there we sum and print data from the second file with the data from the first file (in the arrays)
awk 'NR==FNR {a[NR]=$2;b[NR]=$3;c[NR]=$4;next} {print $1, $2+a[FNR],$3+b[FNR], $4+c[FNR]}' $firstfile $secondfile > $endfile
 
#when the file contains a single element, you can easily read it into a variable:
genetoplot=$(cat "filename.dat")
 
#gnuplot is great for quick plotting from the command line. you can give as many consecutive commands as you want with -e and " ..;.."
#this one will make a 2d plot with the xcoordinates from col 4, the ycoordinates from col 5 and the points colored according to col 2.
gnuplot -e "unset key; set term svg; set output 'bands_size.svg'; plot 'FullAncestry' u 4:5:2 w linespoints pt 7 palette"
 
#if you want to make things pretty or have a lot of commands to give, a batchfile collecting these may be handier.
# with -e you can also pass an argument to the batchfile: neat!
gnuplot -e "gene=$genetoplot" gnubatchfile
 
#or we just call python to do something for us.
python somescript.py filename.dat $genetoplot
 
 
#below I collect the filenames of many pictures in an array to paste them together in one figure.
declare -a FOUR
## picture collection ##
# collect the file names
COUNTER=0
for seed in ${DIRARRAY[@]};do
 
 FOUR=( "${FOUR[@]}" "$direcname""_$seed/somesubdir${AGENTARRAY[$COUNTER]}/thispic.png")
 
 ((COUNTER++))
done
 
#an imagemagick command combining the pictures in a certain way. tile specifies the number of columns (-tile nr x) or rows (-tile x nr) of pictures
#geometry here specifies the number of pixels between each picture. here I added 10 for both the x and y direction.
montage "${FOUR[@]}" -geometry +10+10 -tile 4x $direcname\_datafiles/fourierdata.png