Unix help

Unix help - description of the @stim_analyze script from a Unix perspective

#!/bin/tcsh

    Use the /bin/tcsh program to interpret this file (as a T-shell script).

#======================================================================
#  variable initializations
#======================================================================


# ----------------------------------------------------------------------
# experiment parameters

set ts          = 300		# length of time series
set stim        =   3		# number of input stimuli
set num_on      =  50		# time points per stimulus (between 0 and $ts/3)

    These are variables that relate directly to the user's experiment.  If the
    user would like to alter the number of time points (ts) from 300, then it
    is very likely that the number of points for each stimulus (num_on) should
    be varied accordingly.  In this case, with 50 trials of each stimulus type,
    150 time points are left for a "rest" condition.

    If the user would alter the number of stimuli, each of commands 'RSFgen',
    'waver' and '3dDeconvolve' would have to be changed accordingly.  It is
    likely that the three contrast files would have to be adjusted also, along
    with the number of General Linear Tests ('-glt') in '3dDeconvolve'.

	(see also: AFNI howto)


# ----------------------------------------------------------------------
# execution parameters

set iterations  = 100		# number of iterations
set seed        = 1234567	# initial random seed
set outdir      = stim_results
set LCfile      = $outdir/LC_sums

    These variables are specifically for the shell script.  The user can
    adjust any of these without having to alter the rest of the script.
    Though if 'iterations' is set to greater than 999, then perhaps the
    number of digits in the 'count' command of the 'foreach' loop should
    be increased accordingly.

    'iterations' : This is the number of times that the body of this script
		   will execute (in the 'foreach' loop).  It is the number of
                   times that a random timing sequence will be generated and
		   tested.

    'seed'       : This is passed to RSFgen to generate each random stimulus
		   sequence.  Passing the same number to RSFgen will result in
                   the same stimulus sequence.

		   A pseudo-random number (i.e., one that is generated with
		   an algorithm, but is meant "to seem" random) is usually
		   generated from a seed, or a previous random number.  Each
		   new "random" number is simply computed from the previous
		   one, such that if the same seed is passed twice, the same
		   number will be generated twice.

		   As a result, if we know the seed passed to 'RSFgen', then
		   we can regenerate the corresponding stimulus timing sequence.

    'outdir'	 : This is the directory into which the results will be placed,
		   including output files from 'RSFgen' (e.g. RSF.stim.017.2.1D,
		   the output file for stimulus 2, at iteration 17), output
		   files from 'waver' (e.g. wav.hrf.017.2.1D, the output file
		   for stimulus 2, at iteration 17), output files from
		   '3dDeconvolve' (e.g. 3dD.nodata.017, for iteration 17), and
		   finally 'LC_sums', containing the sum of the 3 resulting
		   variances from 3dD.nodata.017, along with the iteration
		   number and random number seed.

		   Since this directory will contain 7 (= 3 + 3 + 1) files for
		   each iteration, plus the 'LC_sums' file, there will be many
		   files in it (in this case, 701).

    'LCfile'	 : This variable is for the file containing the main results.
		   It has one line per iteration, consisting of the following:

			o the three variances and their sum (each variance is
			  from the corresponding general linear test done by
			  '3dDeconvolve'

		   	o the iteration number

			o the seed which was passed to RSFgen to generate the
			  stimulus timing sequence

		   Note that each line begins with the aforementioned sum,
		   allowing a sorting of the file to locate the smallest sum of
		   variances, e.g. 33, from the results of iteration 039:

			33 = 11 + 11 + 11 : iteration 039, seed 1234606


#======================================================================
#  final setup and test
#======================================================================

# ------------------------------------------------------------
# make sure $outdir exists

if ( ! -d $outdir ) then
    mkdir $outdir
    if ( $status ) then
	echo "failure, cannot create output directory, $outdir"
	exit
    endif
endif

    Make sure that the output directory exists.  If it is not already here
    (probably from a previous execution), create it.  All of the output files
    will be put into this directory.

    If the attempt to create it fails, the user probably does not have write
    permissions to this directory.  Whine and exit.

    # Unix

    The "-d $outdir" test is true when '$outdir' exists, and is a directory.
    Recall that the "!" character acts as a logical negation.  So the first
    "if statement" reads, "if $outdir does not exist as a directory".  In this
    case, use 'mkdir' to create the directory.

    The '$status' variable shows the success of the previous statement.  So in
    the second "if statement", '$status' will be 0 if the 'mkdir' command was
    successful.  Otherwise, it failed.

    The test "if ( $status )" is logically equivalent to "if ( $status != 0 )".
    So it is saying, "if the 'mkdir' command failed, then...".  In this case,
    tell the user that there was a problem, and 'exit' the script.


# ------------------------------
# create empty LC file
echo -n "" > $LCfile

    Create an empty output file, to contain the results of each iteration.

    # Unix

    The "-n" option to echo says not to echo a newline character.  So this
    command will echo absolutely nothing, sending all data to '$LCfile'.
    This has 2 effects.  The first is, if '$LCfile' did not yet exist, it is
    created.  The second effect is, either way, make sure '$LCfile' is empty.

# ------------------------------
echo -n "iteration: 000"

    Keep the user informed of what iteration the script is on.  Since no
    iterations have been performed yet, start with 000.


#======================================================================
#  the main loop
#======================================================================

    During a single iteration of the loop, go through the following steps:

	o note the current random number seed
	o run 'RSFgen' to generate three random stimulus files, based on
	  the input seed
	o use 'waver' to convert the three "binary" stimulus files into
	  files representing "ideal" hemodynamic response functions
	o use '3dDeconvolve' to compute the amount of underlying variance,
	  assuming no noise, and "perfect" activation

# ------------------------------------------------------------
# run the test '$iterations' times

foreach iter (`count -digits 3 1 $iterations`)

    This "foreach" statement controls the looping process.  The loop will
    execute '$iterations' times (100 times, in this example).

    # Unix

    Recall that 'count' is an AFNI program that outputs the numbers from
    the bottom value to the top value.  With the '-digits' option present,
    each number will have 3 digits (unless they exceed 999).

    Here, the command "count -digits 3 1 100" will result in the output of:

	001 001 002 003 004 005 006 ... 096 097 098 099 100

    Recall also that the backward quotes around the command (``) tell the
    shell to first execute the command, and then substitute the results for
    the expression in backward quotes.  So in this case, the 'foreach' command
    that gets evaluated will look like:

	foreach iter ( 001 002 003 004 ... 997 998 999 100 )

    Thus, the variable '$iter' will take on those strings, one by one, as the
    loop gets executed those 100 times.


	# ------------------------------
	# make some other random seed

	@ seed = $seed + 1

	    Here the seed is incremented during each iteration of the loop.
	    Since this number will be passed to 'RSFgen' to create the random
	    stimulus timing sequences (one for each of the three conditions),
	    a new set of sequences will be generated each time.  In this
	    example, 'seed' will be 1234568 for the first execution of RSFgen.

	    # Unix

	    The T-shell (tcsh) is designed only for simple, integral numerical
 	    computations.  That is to say, it only works with integers, and
	    the basic +,-,*,/ operations.

	    The '@' command is similar to the 'set' command of the shell.  It
	    is used to assign some string value to a variable.  What makes '@'
 	    special is that it tells the shell to evaluate the right hand side
 	    numerically, before assigning the resulting string to the variable.


	# ------------------------------------------------------------
	# create random order stim files

	RSFgen -nt ${ts}        \
	    -num_stimts ${stim} \
	    -nreps 1 ${num_on}  \
	    -nreps 2 ${num_on}  \
	    -nreps 3 ${num_on}  \
	    -seed ${seed}       \
	    -prefix RSF.stim.${iter}. >& /dev/null

	    This 'RSFgen' command is used to actually create a random stimulus
	    time series for each of the 3 stimuli.  Note the options:

	    '-nt 300'		: the overall time course should have 300 points

		The variable ${ts} was set to 300 at the top of the script.
		It relates to ${num_on}, below.

	    '-num_stimts 3'     : there are 3 stimuli

		The variable ${stim} was set to 3 at the top of the script.
		It specifies the number of stimuli.

	    '-nreps 1 50'	: stimulus 1 should occur 50 times (out of 300)

		The variable ${num_on} was set to 50 at the top of the script.
		It specifies the number of "on" points for each stimulus.

	    '-nreps 2 50'	: stimulus 2 should occur 50 times (out of 300)
	    '-nreps 3 50'	: stimulus 3 should occur 50 times (out of 300)
	    '-seed ${seed}'     : use ${seed} to generate random numbers
				  (starting from 1234567, and incrementing by 1)
	    '-prefix RSF.stim.${iter}.'
				: the prefix for the resulting stimulus files
				  (note that the current iteration number is
				   part of the file name)

	    # Unix

	    Note the use of the '\' character (for line continuation).  When
	    it is the last character on the line, the shell reads the next
	    line as if it were on the current one.  This is basically to make
	    long commands easier for us humans to read.

	    The actual command that gets executed by the shell is (say, for
  	    iteration 7, and without the redirection to /dev/null) :

		RSFgen -nt 300 -num_stimts 3 -nreps 1 50 -nreps 2 50
		       -nreps 3 50 -seed 1234574 -prefix RSF.stim.007.

	    # -----

		regarding:  '>& /dev/null'

	    Another noteworthy part of this line is redirection to the special
	    file '/dev/null'.  This is done to hide all of the output from the
	    command (throwing it away).  In this example, the text that 'RSFgen'
	    outputs to the screen is not of interest, so it is thrown away.

	    First recall that redirection (using '>') is done to send all
	    normal text output (which would otherwise appear on the screen) to 
	    a file.  The use of '>&' means to redirect both normal text
	    output (stdout) and "error" text output (stderr) from the running
	    program to some file.  In this case, the file getting the output of
	    the command is /dev/null.

	    The special file /dev/null is basically like a black hole.  Any
	    data that is written to it is lost, so no output from the 'RSFgen'
	    command will appear on the screen.

	    If the output of a program were actually of interest, then it would
	    be good to redirect that output to some normal file, which the user
	    could read later on.


	# ------------------------------------------------------------
	# from the random order stim files, make ideal reference functions

	waver -GAM -dt 1.0 -input RSF.stim.${iter}.1.1D > wav.hrf.${iter}.1.1D
	waver -GAM -dt 1.0 -input RSF.stim.${iter}.2.1D > wav.hrf.${iter}.2.1D
	waver -GAM -dt 1.0 -input RSF.stim.${iter}.3.1D > wav.hrf.${iter}.3.1D

	    Recall that the 'RSF.stim.' files are "binary" stimulus timing
	    files (consisting of zeros and ones, in ascii format).  The 'waver'
	    program will, in this case, produce "ideal hemodynamic response"
	    data, which gets redirected from screen output to 'wav.hrf.' files.

	    Again, there are three stimulus types, and an ideal response
	    function is generated for each one (and for each iteration).

	    # Unix

	    Note that stderr is not being redirected here, since error messages
	    are not what is sought.  Only the standard output (stdout) from
	    'waver' is redirected to a file.


	# ------------------------------------------------------------
	# now evaluate the stimulus timings

	3dDeconvolve				\
	    -nodata				\
	    -nfirst 4				\
	    -nlast 299				\
	    -polort 1				\
	    -num_stimts ${stim}			\
	    -stim_file 1 "wav.hrf.${iter}.1.1D"	\
	    -stim_label 1 "stim_A"		\
	    -stim_file 2 "wav.hrf.${iter}.2.1D"	\
	    -stim_label 2 "stim_B"		\
	    -stim_file 3 "wav.hrf.${iter}.3.1D"	\
	    -stim_label 3 "stim_C"		\
	    -glt 1 contrasts/contrast_AB	\
	    -glt 1 contrasts/contrast_AC	\
	    -glt 1 contrasts/contrast_BC	\
		> 3dD.nodata.${iter}


	    The program '3dDeconvolve' is run to test the effectiveness of the
	    experimental design.  The output, which normally goes to the screen,
	    is redirected to a '3dD.nodata.' file for further analysis.

	    # Unix

	    Like above with the 'waver' program, the command options are
	    broken up across lines with the '\' character, just to make it
	    easier to read.  In this case, only standard out (stdout) is
	    redirected to a file.  For example, the output file for iteration
	    number 007 is '3dD.nodata.007' (since each iteration is 3 digits).


	# ------------------------------------------------------------
	# save the sum of the 3 LC values

	set nums = ( `awk -F= '/LC/ {print $2 * 10000}' 3dD.nodata.${iter}` )

	    This command creates a shell variable '$nums'.  It is going to be
	    a list of standard deviations for the general linear tests, taken
	    from the redirected output file of the '3dDeconvolve' command.

	    The standard deviations are multiplied by 10000 here, to allow
	    them to be treated as integers.  Since 3dDeconvolve outputs the
	    standard deviations to 4 decimal places, multiplying them by 10000
	    will guarantee that they are integers.

	    # Unix

	    This command has a few parts.  First off, consider the 'awk'
	    command:

		awk -F= '/LC/ {print $2 * 10000}'

	    At a high-level, this reads:

		"Search for any lines containing the character pair 'LC'.
		Separate any such line into pieces, where the '=' character is
		the separator.  From that, print out the value in the second
		field, multiplied by 10000."

	    So the standard deviation from each general linear test will be
	    output, after it is multiplied by 10000.

	    # -----

	    Next, consider the `` characters around the 'awk' command.  Note
	    that those are backward quotes (not the single quote often found
	    on the same key as the double quote).  As in:

		`awk -F= ...`

	    The shell handles these in a special way.  First, the command
	    inside the quotes is executed (the 'awk' command, in this example).
	    Then the quoted part of the line is replaced by the output of that
	    command.

	    For example, the output file '3dD.nodata.007' contains standard
	    deviations of 0.0011, 0.0012 and 0.0013.  So for iteration 007,
	    the quoted part of the line will be replaced by "11 12 13".  Try
	    out the following command from the 'stim_results' directory:
		
	        echo `awk -F= '/LC/ {print $2 * 10000}' 3dD.nodata.007`

	    The output should be:

		11 12 13

	    # -----

	    Last, consider the set command:

		set nums = ( `awk ...` )

	    Recall that setting a variable to be something in parentheses
	    makes that variable an array.  So taking iteration 007 for example,
	    '$nums' will now be the list:

		11 12 13


	# ------------------------------------------------------------
	# set num_sum to be the sum of the three values in the '$nums' array

	@ num_sum = $nums[1] + $nums[2] + $nums[3]

	    Just above, '$nums' was set to be the array of the 3 standard
	    deviations from the General Linear Tests of '3dDeconvolve' (again,
	    note that the numbers were multiplied by 10000 to get integers).
	    This line assigns the sum of those 3 integers to '$num_sum'.

	    # Unix

	    In this case, '$nums' is an array of 3 elements.  If one were to
	    "echo $nums", the output might be "11 12 13".  To access individual
	    elements of an array, use '[]', where the indices start at 1.

	    It is perhaps a little troublesome that the first index is 1 (and
	    not 0), but C'est la via.  The user will have to keep in mind when
	    to use an initial index of 0, and when to use one of 1.  Each way
	    makes sense, and each makes more sense than the other under various
	    circumstances.  Here, when using T-shell scripts, array indexing
	    starts with 1.

	    Recall from above the use of the '@' command.  The '@' command
	    works like the 'set' command does, except that the right hand side
	    is evaluated numerically first.  Consider the current command:

		@ num_sum = $nums[1] + $nums[2] + $nums[3]

	    Here, '$num_sum' might take on the value "36" (for iteration 007).
	    If the 'set' command were used in place of '@', as in:

		set num_sum = $nums[1] + $nums[2] + $nums[3]

	    then (aside from the fact that the command itself would fail, which
	    we will ignore for the time being), '$num_sum' might take on the
	    value "11 + 12 + 13".  It would just be the resulting string, not
	    a resulting numerical evaluation of the string.

	    The reason the above 'set' command would fail is because 'set'
	    actually takes an entire list of assignments.  The basic form of
	    set is:

		set var1 var2 = val2 var3 var4 = val4 ...

	    Here, '$var1' would be set (it would exist), but its value would be
  	    an empty string (as in: set var1 = "").  The same holds true for
	    '$var3'.  Anyplace where an '=' character is found, that variable
	    is set to the value to the right of it.

	    On the other hand, '$var2' would take on the string value "val2",
	    and '$var4' would take on the string value "val4".

	    So the reason the "set num_sum = ..." command would fail is because
	    the command says to set '+' as a variable.  But '+' is an illegal
	    variable name, so the command fails.  Legal variable names start
	    with a letter, and contain a sequence of letters, digits, and
	    underscore '_' characters.

	    In order set a variable to a text string that looks like a sum,
	    use quotes.  For example:

		set my_sum_string = "$nums[1] + $nums[2] + $nums[3]"

	    In this form, using the double quotes, 'my_sum_string' would
	    take on a string value like, "10 + 11 + 12".

	    As an additional example, consider the first command in the next
	    section.


	# ------------------------------------------------------------
	# append the results from this iteration to the output file

	echo -n "$num_sum = $nums[1] + $nums[2] + $nums[3] : " >> $LCfile
        echo    "iteration $iter, seed $seed"                  >> $LCfile

	    These two 'echo' commands will append one line of data to the
	    output file '$LCfile' (called 'stim_results/LC_sums').  This data
	    is the standard deviation sum, along with the individual values,
	    along with the iteration number and the current seed.

	    Consider iteration 007, for example.  The line that would be
	    appended is:

		36 = 11 + 12 + 13 : iteration 007, seed 1234574

	    The reason the sum comes first on the line is so the file can be
	    sorted later.  If the smallest sum is the goal, that can be found
	    by sorting the output file.

	    # Unix

	    Note the '-n' option in the first 'echo' command.  That option
	    tells 'echo' not to output a "newline" character at the end of
	    the line.  This way, both sets of output appear on the same line
	    of the output file.

	    Consider also the end of the commands:

		>> $LCfile

	    Recall first that '>' is used to redirect standard output from
	    the command to the file (following the '>' symbol).  But one
	    aspect of simple redirection is that the output file is basically
	    created as new each time,  i.e. if it did not yet exist, it would
	    be created, but if it did exist, the existing contents would be
	    destroyed.

	    The symbol '>>' is an "append" symbol.  It works like
	    redirection does, except it does not destroy the current contents
	    of the output file.


	# ------------------------------------------------------------
	# move the output files from individual steps into the output directory

	mv RSF.stim.*.1D wav.hrf.*.1D 3dD.nodata* $outdir

	    The 'RSF.stim.*.1D' files were created from the 'RSFgen' commands.
	    The 'wav.hrf.*.1D' files were created from the 'waver' commands.
	    The '3dD.nodata*' file was created from the '3dDeconvolve' command.

	    Move all of these files into the output directory, in case the
	    user wants to look at them.

	    # Unix

	    Recall that when moving multiple files with 'mv', the last
	    command argument should be a directory (a place to put all of the
	    files).  Note that if the last argument does not exist (and as a 
	    directory), the shell takes it to be a file.

	    If the last argument is a file, one of two things will happen.

		1. On some operating systems, the user will get an error
	 	   message saying "mv: when moving multiple files, last
	  	   argument must be a directory", and the command will exit
	  	   without doing anything else.

		   This is the good case.

		2. On some other systems, one by one, each file is moved (or
		   renamed) to the last file name.

		   THIS IS VERY BAD.  And so the command was fixed.

		   The result is all of the files are gone, except for the
	   	   second to the last one, which takes on the name of the
	   	   last argument.  For example, consider the command:

		 	mv a b c d e f g

	  	   Suppose 'g' is a file, or does not yet exist, and further
		   suppose that the command actually "succeeds".  The result
		   is that files "a b c d e" are gone.  And the file 'f' is
	  	   now named 'g'.  Most likely, not what the user intended.


	# ------------------------------------------------------------
	# let the user know the current iteration

	echo -n "\b\b\b$iter"

	    This is an entertaining little piece of the script.  First note that
	    this is the only output to the screen within the 'foreach' loop.
	    What this does is backspace over the last 3 characters of output,
	    and then display the value of '$iter' (and do not print a newline
	    character).

	    Since '$iter' is a 3 character number, this will be erased each
	    time, getting overwritten by the new value.  So the user will just
	    see the number quickly count up.  It looks cool.  :)


# ------------------------------------------------------------
# terminate the 'foreach' loop

end

    # Unix

    Just recall that 'end' tells the shell where the previous 'foreach'
    loop finishes.  If there are more iterations for the shell to do, then
    processing will continue up at the corresponding 'foreach' statement,
    with the variable '$iter' taking on the next value.

    If the final iteration is completed, then processing will continue
    after this 'end' statement.


# ------------------------------------------------------------
# display final information for the user

echo ""
echo "done, results are in '$outdir', LC sums are in '$LCfile'"
echo consider the command: "sort $LCfile | head -1"

    Print a blank line, and then remind the user where the output files are,
    and in particular, where the main output file is.  Then suggest a command
    for sorting the main output file, to get one of the multiple smallest
    values on top.

    As an alternate command, consider:

	sort $LCfile | head

    This will output first 10 lines of the sorted file, not just the first 1.

    # Unix

    Consider the command:

	sort $LCfile | head -1

    The first part of the command will 'sort' the output file, writing the
    results to the screen (stdout).  The results of this are now sent to the
    'head' program, which just displays the top lines of a file.  Since the
    results of 'sort' are sent to "head -1", the 'head' command will display
    the first line of data that is sent to it.

    The '|' character is called a "pipe".  Recall that the '>' character
    redirects output from a program to a file.  The '|' (pipe) character
    instead redirects output from one program to the input of another.  So
    the output of 'sort' (i.e. the sorted file) goes to the 'head' command.

    The result of this is displaying the top line with the smalled sum of
    standard deviations.  Recall that the output file just contains lines like:

	36 = 11 + 12 + 13 : iteration 007, seed 1234574

    If these are sorted, the lines with the smallest sum are at the top of
    sorted list.  Printing the top line gives one of those "smallest summed
    standard deviations", which may have been the exact goal of running this
    script.

    In this example, the line that is output from that command is:

	33 = 11 + 11 + 11 : iteration 039, seed 1234606


# ------------------------------------------------------------
# final note

The goal of this script is to find a good set of stimulus timing files
(generated by 'RSFgen').  Since, for example, 33 is the smallest sum (from
100 iterations), and occurring on iteration 039, we might be satisfied with
the stimulus timing sequences from the files:

	stim_results/RSF.stim.039.1.1D
	stim_results/RSF.stim.039.2.1D
	stim_results/RSF.stim.039.3.1D


    Thank you, and good night.
    "I'll miss you most of all, Scarecrow!"