Unix help - description of the @stim_analyze script
from a Unix perspective
#!/bin/tcsh
Use the /bin/tcsh program to interpret this file (as a T-shell script).
#======================================================================
# variable initializations
#======================================================================
# ----------------------------------------------------------------------
# experiment parameters
set ts = 300 # length of time series
set stim = 3 # number of input stimuli
set num_on = 50 # time points per stimulus (between 0 and $ts/3)
These are variables that relate directly to the user's experiment. If the
user would like to alter the number of time points (ts) from 300, then it
is very likely that the number of points for each stimulus (num_on) should
be varied accordingly. In this case, with 50 trials of each stimulus type,
150 time points are left for a "rest" condition.
If the user would alter the number of stimuli, each of commands 'RSFgen',
'waver' and '3dDeconvolve' would have to be changed accordingly. It is
likely that the three contrast files would have to be adjusted also, along
with the number of General Linear Tests ('-glt') in '3dDeconvolve'.
(see also: AFNI howto)
# ----------------------------------------------------------------------
# execution parameters
set iterations = 100 # number of iterations
set seed = 1234567 # initial random seed
set outdir = stim_results
set LCfile = $outdir/LC_sums
These variables are specifically for the shell script. The user can
adjust any of these without having to alter the rest of the script.
Though if 'iterations' is set to greater than 999, then perhaps the
number of digits in the 'count' command of the 'foreach' loop should
be increased accordingly.
'iterations' : This is the number of times that the body of this script
will execute (in the 'foreach' loop). It is the number of
times that a random timing sequence will be generated and
tested.
'seed' : This is passed to RSFgen to generate each random stimulus
sequence. Passing the same number to RSFgen will result in
the same stimulus sequence.
A pseudo-random number (i.e., one that is generated with
an algorithm, but is meant "to seem" random) is usually
generated from a seed, or a previous random number. Each
new "random" number is simply computed from the previous
one, such that if the same seed is passed twice, the same
number will be generated twice.
As a result, if we know the seed passed to 'RSFgen', then
we can regenerate the corresponding stimulus timing sequence.
'outdir' : This is the directory into which the results will be placed,
including output files from 'RSFgen' (e.g. RSF.stim.017.2.1D,
the output file for stimulus 2, at iteration 17), output
files from 'waver' (e.g. wav.hrf.017.2.1D, the output file
for stimulus 2, at iteration 17), output files from
'3dDeconvolve' (e.g. 3dD.nodata.017, for iteration 17), and
finally 'LC_sums', containing the sum of the 3 resulting
variances from 3dD.nodata.017, along with the iteration
number and random number seed.
Since this directory will contain 7 (= 3 + 3 + 1) files for
each iteration, plus the 'LC_sums' file, there will be many
files in it (in this case, 701).
'LCfile' : This variable is for the file containing the main results.
It has one line per iteration, consisting of the following:
o the three variances and their sum (each variance is
from the corresponding general linear test done by
'3dDeconvolve'
o the iteration number
o the seed which was passed to RSFgen to generate the
stimulus timing sequence
Note that each line begins with the aforementioned sum,
allowing a sorting of the file to locate the smallest sum of
variances, e.g. 33, from the results of iteration 039:
33 = 11 + 11 + 11 : iteration 039, seed 1234606
#======================================================================
# final setup and test
#======================================================================
# ------------------------------------------------------------
# make sure $outdir exists
if ( ! -d $outdir ) then
mkdir $outdir
if ( $status ) then
echo "failure, cannot create output directory, $outdir"
exit
endif
endif
Make sure that the output directory exists. If it is not already here
(probably from a previous execution), create it. All of the output files
will be put into this directory.
If the attempt to create it fails, the user probably does not have write
permissions to this directory. Whine and exit.
# Unix
The "-d $outdir" test is true when '$outdir' exists, and is a directory.
Recall that the "!" character acts as a logical negation. So the first
"if statement" reads, "if $outdir does not exist as a directory". In this
case, use 'mkdir' to create the directory.
The '$status' variable shows the success of the previous statement. So in
the second "if statement", '$status' will be 0 if the 'mkdir' command was
successful. Otherwise, it failed.
The test "if ( $status )" is logically equivalent to "if ( $status != 0 )".
So it is saying, "if the 'mkdir' command failed, then...". In this case,
tell the user that there was a problem, and 'exit' the script.
# ------------------------------
# create empty LC file
echo -n "" > $LCfile
Create an empty output file, to contain the results of each iteration.
# Unix
The "-n" option to echo says not to echo a newline character. So this
command will echo absolutely nothing, sending all data to '$LCfile'.
This has 2 effects. The first is, if '$LCfile' did not yet exist, it is
created. The second effect is, either way, make sure '$LCfile' is empty.
# ------------------------------
echo -n "iteration: 000"
Keep the user informed of what iteration the script is on. Since no
iterations have been performed yet, start with 000.
#======================================================================
# the main loop
#======================================================================
During a single iteration of the loop, go through the following steps:
o note the current random number seed
o run 'RSFgen' to generate three random stimulus files, based on
the input seed
o use 'waver' to convert the three "binary" stimulus files into
files representing "ideal" hemodynamic response functions
o use '3dDeconvolve' to compute the amount of underlying variance,
assuming no noise, and "perfect" activation
# ------------------------------------------------------------
# run the test '$iterations' times
foreach iter (`count -digits 3 1 $iterations`)
This "foreach" statement controls the looping process. The loop will
execute '$iterations' times (100 times, in this example).
# Unix
Recall that 'count' is an AFNI program that outputs the numbers from
the bottom value to the top value. With the '-digits' option present,
each number will have 3 digits (unless they exceed 999).
Here, the command "count -digits 3 1 100" will result in the output of:
001 001 002 003 004 005 006 ... 096 097 098 099 100
Recall also that the backward quotes around the command (``) tell the
shell to first execute the command, and then substitute the results for
the expression in backward quotes. So in this case, the 'foreach' command
that gets evaluated will look like:
foreach iter ( 001 002 003 004 ... 997 998 999 100 )
Thus, the variable '$iter' will take on those strings, one by one, as the
loop gets executed those 100 times.
# ------------------------------
# make some other random seed
@ seed = $seed + 1
Here the seed is incremented during each iteration of the loop.
Since this number will be passed to 'RSFgen' to create the random
stimulus timing sequences (one for each of the three conditions),
a new set of sequences will be generated each time. In this
example, 'seed' will be 1234568 for the first execution of RSFgen.
# Unix
The T-shell (tcsh) is designed only for simple, integral numerical
computations. That is to say, it only works with integers, and
the basic +,-,*,/ operations.
The '@' command is similar to the 'set' command of the shell. It
is used to assign some string value to a variable. What makes '@'
special is that it tells the shell to evaluate the right hand side
numerically, before assigning the resulting string to the variable.
# ------------------------------------------------------------
# create random order stim files
RSFgen -nt ${ts} \
-num_stimts ${stim} \
-nreps 1 ${num_on} \
-nreps 2 ${num_on} \
-nreps 3 ${num_on} \
-seed ${seed} \
-prefix RSF.stim.${iter}. >& /dev/null
This 'RSFgen' command is used to actually create a random stimulus
time series for each of the 3 stimuli. Note the options:
'-nt 300' : the overall time course should have 300 points
The variable ${ts} was set to 300 at the top of the script.
It relates to ${num_on}, below.
'-num_stimts 3' : there are 3 stimuli
The variable ${stim} was set to 3 at the top of the script.
It specifies the number of stimuli.
'-nreps 1 50' : stimulus 1 should occur 50 times (out of 300)
The variable ${num_on} was set to 50 at the top of the script.
It specifies the number of "on" points for each stimulus.
'-nreps 2 50' : stimulus 2 should occur 50 times (out of 300)
'-nreps 3 50' : stimulus 3 should occur 50 times (out of 300)
'-seed ${seed}' : use ${seed} to generate random numbers
(starting from 1234567, and incrementing by 1)
'-prefix RSF.stim.${iter}.'
: the prefix for the resulting stimulus files
(note that the current iteration number is
part of the file name)
# Unix
Note the use of the '\' character (for line continuation). When
it is the last character on the line, the shell reads the next
line as if it were on the current one. This is basically to make
long commands easier for us humans to read.
The actual command that gets executed by the shell is (say, for
iteration 7, and without the redirection to /dev/null) :
RSFgen -nt 300 -num_stimts 3 -nreps 1 50 -nreps 2 50
-nreps 3 50 -seed 1234574 -prefix RSF.stim.007.
# -----
regarding: '>& /dev/null'
Another noteworthy part of this line is redirection to the special
file '/dev/null'. This is done to hide all of the output from the
command (throwing it away). In this example, the text that 'RSFgen'
outputs to the screen is not of interest, so it is thrown away.
First recall that redirection (using '>') is done to send all
normal text output (which would otherwise appear on the screen) to
a file. The use of '>&' means to redirect both normal text
output (stdout) and "error" text output (stderr) from the running
program to some file. In this case, the file getting the output of
the command is /dev/null.
The special file /dev/null is basically like a black hole. Any
data that is written to it is lost, so no output from the 'RSFgen'
command will appear on the screen.
If the output of a program were actually of interest, then it would
be good to redirect that output to some normal file, which the user
could read later on.
# ------------------------------------------------------------
# from the random order stim files, make ideal reference functions
waver -GAM -dt 1.0 -input RSF.stim.${iter}.1.1D > wav.hrf.${iter}.1.1D
waver -GAM -dt 1.0 -input RSF.stim.${iter}.2.1D > wav.hrf.${iter}.2.1D
waver -GAM -dt 1.0 -input RSF.stim.${iter}.3.1D > wav.hrf.${iter}.3.1D
Recall that the 'RSF.stim.' files are "binary" stimulus timing
files (consisting of zeros and ones, in ascii format). The 'waver'
program will, in this case, produce "ideal hemodynamic response"
data, which gets redirected from screen output to 'wav.hrf.' files.
Again, there are three stimulus types, and an ideal response
function is generated for each one (and for each iteration).
# Unix
Note that stderr is not being redirected here, since error messages
are not what is sought. Only the standard output (stdout) from
'waver' is redirected to a file.
# ------------------------------------------------------------
# now evaluate the stimulus timings
3dDeconvolve \
-nodata \
-nfirst 4 \
-nlast 299 \
-polort 1 \
-num_stimts ${stim} \
-stim_file 1 "wav.hrf.${iter}.1.1D" \
-stim_label 1 "stim_A" \
-stim_file 2 "wav.hrf.${iter}.2.1D" \
-stim_label 2 "stim_B" \
-stim_file 3 "wav.hrf.${iter}.3.1D" \
-stim_label 3 "stim_C" \
-glt 1 contrasts/contrast_AB \
-glt 1 contrasts/contrast_AC \
-glt 1 contrasts/contrast_BC \
> 3dD.nodata.${iter}
The program '3dDeconvolve' is run to test the effectiveness of the
experimental design. The output, which normally goes to the screen,
is redirected to a '3dD.nodata.' file for further analysis.
# Unix
Like above with the 'waver' program, the command options are
broken up across lines with the '\' character, just to make it
easier to read. In this case, only standard out (stdout) is
redirected to a file. For example, the output file for iteration
number 007 is '3dD.nodata.007' (since each iteration is 3 digits).
# ------------------------------------------------------------
# save the sum of the 3 LC values
set nums = ( `awk -F= '/LC/ {print $2 * 10000}' 3dD.nodata.${iter}` )
This command creates a shell variable '$nums'. It is going to be
a list of standard deviations for the general linear tests, taken
from the redirected output file of the '3dDeconvolve' command.
The standard deviations are multiplied by 10000 here, to allow
them to be treated as integers. Since 3dDeconvolve outputs the
standard deviations to 4 decimal places, multiplying them by 10000
will guarantee that they are integers.
# Unix
This command has a few parts. First off, consider the 'awk'
command:
awk -F= '/LC/ {print $2 * 10000}'
At a high-level, this reads:
"Search for any lines containing the character pair 'LC'.
Separate any such line into pieces, where the '=' character is
the separator. From that, print out the value in the second
field, multiplied by 10000."
So the standard deviation from each general linear test will be
output, after it is multiplied by 10000.
# -----
Next, consider the `` characters around the 'awk' command. Note
that those are backward quotes (not the single quote often found
on the same key as the double quote). As in:
`awk -F= ...`
The shell handles these in a special way. First, the command
inside the quotes is executed (the 'awk' command, in this example).
Then the quoted part of the line is replaced by the output of that
command.
For example, the output file '3dD.nodata.007' contains standard
deviations of 0.0011, 0.0012 and 0.0013. So for iteration 007,
the quoted part of the line will be replaced by "11 12 13". Try
out the following command from the 'stim_results' directory:
echo `awk -F= '/LC/ {print $2 * 10000}' 3dD.nodata.007`
The output should be:
11 12 13
# -----
Last, consider the set command:
set nums = ( `awk ...` )
Recall that setting a variable to be something in parentheses
makes that variable an array. So taking iteration 007 for example,
'$nums' will now be the list:
11 12 13
# ------------------------------------------------------------
# set num_sum to be the sum of the three values in the '$nums' array
@ num_sum = $nums[1] + $nums[2] + $nums[3]
Just above, '$nums' was set to be the array of the 3 standard
deviations from the General Linear Tests of '3dDeconvolve' (again,
note that the numbers were multiplied by 10000 to get integers).
This line assigns the sum of those 3 integers to '$num_sum'.
# Unix
In this case, '$nums' is an array of 3 elements. If one were to
"echo $nums", the output might be "11 12 13". To access individual
elements of an array, use '[]', where the indices start at 1.
It is perhaps a little troublesome that the first index is 1 (and
not 0), but C'est la via. The user will have to keep in mind when
to use an initial index of 0, and when to use one of 1. Each way
makes sense, and each makes more sense than the other under various
circumstances. Here, when using T-shell scripts, array indexing
starts with 1.
Recall from above the use of the '@' command. The '@' command
works like the 'set' command does, except that the right hand side
is evaluated numerically first. Consider the current command:
@ num_sum = $nums[1] + $nums[2] + $nums[3]
Here, '$num_sum' might take on the value "36" (for iteration 007).
If the 'set' command were used in place of '@', as in:
set num_sum = $nums[1] + $nums[2] + $nums[3]
then (aside from the fact that the command itself would fail, which
we will ignore for the time being), '$num_sum' might take on the
value "11 + 12 + 13". It would just be the resulting string, not
a resulting numerical evaluation of the string.
The reason the above 'set' command would fail is because 'set'
actually takes an entire list of assignments. The basic form of
set is:
set var1 var2 = val2 var3 var4 = val4 ...
Here, '$var1' would be set (it would exist), but its value would be
an empty string (as in: set var1 = ""). The same holds true for
'$var3'. Anyplace where an '=' character is found, that variable
is set to the value to the right of it.
On the other hand, '$var2' would take on the string value "val2",
and '$var4' would take on the string value "val4".
So the reason the "set num_sum = ..." command would fail is because
the command says to set '+' as a variable. But '+' is an illegal
variable name, so the command fails. Legal variable names start
with a letter, and contain a sequence of letters, digits, and
underscore '_' characters.
In order set a variable to a text string that looks like a sum,
use quotes. For example:
set my_sum_string = "$nums[1] + $nums[2] + $nums[3]"
In this form, using the double quotes, 'my_sum_string' would
take on a string value like, "10 + 11 + 12".
As an additional example, consider the first command in the next
section.
# ------------------------------------------------------------
# append the results from this iteration to the output file
echo -n "$num_sum = $nums[1] + $nums[2] + $nums[3] : " >> $LCfile
echo "iteration $iter, seed $seed" >> $LCfile
These two 'echo' commands will append one line of data to the
output file '$LCfile' (called 'stim_results/LC_sums'). This data
is the standard deviation sum, along with the individual values,
along with the iteration number and the current seed.
Consider iteration 007, for example. The line that would be
appended is:
36 = 11 + 12 + 13 : iteration 007, seed 1234574
The reason the sum comes first on the line is so the file can be
sorted later. If the smallest sum is the goal, that can be found
by sorting the output file.
# Unix
Note the '-n' option in the first 'echo' command. That option
tells 'echo' not to output a "newline" character at the end of
the line. This way, both sets of output appear on the same line
of the output file.
Consider also the end of the commands:
>> $LCfile
Recall first that '>' is used to redirect standard output from
the command to the file (following the '>' symbol). But one
aspect of simple redirection is that the output file is basically
created as new each time, i.e. if it did not yet exist, it would
be created, but if it did exist, the existing contents would be
destroyed.
The symbol '>>' is an "append" symbol. It works like
redirection does, except it does not destroy the current contents
of the output file.
# ------------------------------------------------------------
# move the output files from individual steps into the output directory
mv RSF.stim.*.1D wav.hrf.*.1D 3dD.nodata* $outdir
The 'RSF.stim.*.1D' files were created from the 'RSFgen' commands.
The 'wav.hrf.*.1D' files were created from the 'waver' commands.
The '3dD.nodata*' file was created from the '3dDeconvolve' command.
Move all of these files into the output directory, in case the
user wants to look at them.
# Unix
Recall that when moving multiple files with 'mv', the last
command argument should be a directory (a place to put all of the
files). Note that if the last argument does not exist (and as a
directory), the shell takes it to be a file.
If the last argument is a file, one of two things will happen.
1. On some operating systems, the user will get an error
message saying "mv: when moving multiple files, last
argument must be a directory", and the command will exit
without doing anything else.
This is the good case.
2. On some other systems, one by one, each file is moved (or
renamed) to the last file name.
THIS IS VERY BAD. And so the command was fixed.
The result is all of the files are gone, except for the
second to the last one, which takes on the name of the
last argument. For example, consider the command:
mv a b c d e f g
Suppose 'g' is a file, or does not yet exist, and further
suppose that the command actually "succeeds". The result
is that files "a b c d e" are gone. And the file 'f' is
now named 'g'. Most likely, not what the user intended.
# ------------------------------------------------------------
# let the user know the current iteration
echo -n "\b\b\b$iter"
This is an entertaining little piece of the script. First note that
this is the only output to the screen within the 'foreach' loop.
What this does is backspace over the last 3 characters of output,
and then display the value of '$iter' (and do not print a newline
character).
Since '$iter' is a 3 character number, this will be erased each
time, getting overwritten by the new value. So the user will just
see the number quickly count up. It looks cool. :)
# ------------------------------------------------------------
# terminate the 'foreach' loop
end
# Unix
Just recall that 'end' tells the shell where the previous 'foreach'
loop finishes. If there are more iterations for the shell to do, then
processing will continue up at the corresponding 'foreach' statement,
with the variable '$iter' taking on the next value.
If the final iteration is completed, then processing will continue
after this 'end' statement.
# ------------------------------------------------------------
# display final information for the user
echo ""
echo "done, results are in '$outdir', LC sums are in '$LCfile'"
echo consider the command: "sort $LCfile | head -1"
Print a blank line, and then remind the user where the output files are,
and in particular, where the main output file is. Then suggest a command
for sorting the main output file, to get one of the multiple smallest
values on top.
As an alternate command, consider:
sort $LCfile | head
This will output first 10 lines of the sorted file, not just the first 1.
# Unix
Consider the command:
sort $LCfile | head -1
The first part of the command will 'sort' the output file, writing the
results to the screen (stdout). The results of this are now sent to the
'head' program, which just displays the top lines of a file. Since the
results of 'sort' are sent to "head -1", the 'head' command will display
the first line of data that is sent to it.
The '|' character is called a "pipe". Recall that the '>' character
redirects output from a program to a file. The '|' (pipe) character
instead redirects output from one program to the input of another. So
the output of 'sort' (i.e. the sorted file) goes to the 'head' command.
The result of this is displaying the top line with the smalled sum of
standard deviations. Recall that the output file just contains lines like:
36 = 11 + 12 + 13 : iteration 007, seed 1234574
If these are sorted, the lines with the smallest sum are at the top of
sorted list. Printing the top line gives one of those "smallest summed
standard deviations", which may have been the exact goal of running this
script.
In this example, the line that is output from that command is:
33 = 11 + 11 + 11 : iteration 039, seed 1234606
# ------------------------------------------------------------
# final note
The goal of this script is to find a good set of stimulus timing files
(generated by 'RSFgen'). Since, for example, 33 is the smallest sum (from
100 iterations), and occurring on iteration 039, we might be satisfied with
the stimulus timing sequences from the files:
stim_results/RSF.stim.039.1.1D
stim_results/RSF.stim.039.2.1D
stim_results/RSF.stim.039.3.1D
Thank you, and good night.
"I'll miss you most of all, Scarecrow!"