Unix help - description of the @DDmb_analyze script from a Unix perspective

#!/bin/tcsh

#======================================================================
#  start with basic variable initializations
#======================================================================

    It is common practice to set general variables at the top of a shell
    script, making them easy to find.  In particular, variables that one
    might expect to change from script use to script use should go at the
    top, such as a subject's name or ID code, the number of runs, or
    perhaps the directory to store the results in.

# ----------------------------------------------------------------------
# save a couple of directory names for future use

set topdir      = `pwd`
set outlier_dir = $topdir/afni/outliers

    The '$topdir' directory is where the script is run from.  Referring to
    directories with respect to '$topdir' makes it more clear where they
    are located within the directory tree.

    The '$outlier_dir' is where the outlier files will be placed, when they
    are produced by executions of AFNI's 'to3d' program.  Those text files
    will then be available to read over, checking for data that might be
    invalid.

    Variables that are specific to a small part a script are usually
    assigned at the beginning of that part.  So it would certainly be
    reasonable to assign '$outlier_dir' just before running 'to3d',
    instead of assigning it here.

    # Unix

    Recall that when backward quotes are used, as in `pwd`, the command
    is executed, with the output actually replacing the command.  Thus,
    multiple steps are taken for the execution of the line:

		set topdir = `pwd`

	1. The 'pwd' command is executed, resulting in, for example,

	       /home/afni_user/howto/ht02_DDmb

	2. That output replaces the command in backward quotes.
	   So the actual shell command reads:

	       set topdir = /home/afni_user/howto/ht02_DDmb

	3. The resulting command is executed.  In this case, 'topdir'
	   is now assigned that directory name.

    Now given the example of what '$topdir' might be, the assignment
    of 'outlier_dir' ends up looking like:

	set outlier_dir = /home/afni_user/howto/ht02_DDmb/afni/outliers


#======================================================================
#  now perform various tests to make sure the world is okay
#======================================================================

    After initial variables have been assigned, it is good to do some
    basic checks, so at least the script won't crash for any obvious
    reason.  In this example, the following tests will be done:

	- Verify that the script being run from the correct directory.
	- Verify that the script has not been previously run, or at
  	    least that it will not overwrite any existing files.  This
	    script only puts new data in the 'afni' directory.
	- Attempt to create a directory.  This may fail, commonly when
	    the user does not have write permissions here.
	- Verify that the AFNI programs are in the user's path.  Simply
	    looking for one of them should be enough ('to3d').


# ----------------------------------------------------------------------
# verify that the script is being run from the proper directory

if ( ! -d DDmb_data ) then
    echo ""
    echo "error: this script must be run from the 'ht02_DDmb' directory"
    echo "       under the top level HOWTO directory"
    echo "exiting..."
    echo ""

    exit
endif

    The 'if' statement is a check to find whether there is a directory
    called 'DDmb_data' here.  If there is not, the script cannot be run.
    So the user is informed, and the script is terminated.

    # Unix

    The '!' character is a logical NOT operator, and the '-d' operator is
    a "directory exists" test.  So if there is no such directory here, then
    the script exits.


# ----------------------------------------------------------------------
# if the afni directory is already here, complain and exit

if ( -e afni ) then
    echo "failure: the 'afni' directory already exists"
    exit
endif

    The existence of the 'afni' directory means the script has been
    previously run.  Since over-writing previous results is considered
    unacceptable, the user must either move or remove that directory.

    # Unix

    Note that the '-e' operator is used here (as opposed to '-d' above).
    This is a check for existence in any way, not just as a directory.


# ----------------------------------------------------------------------
# make a directory to store the outlier files from the to3d commands

mkdir -p $outlier_dir

    Create the outlier directory.  Recall from above that this was assigned
    the value of '$topdir/afni/outliers'.

    # Unix

    The '-p' option is to make parent directories, as needed.  It is used
    here because the 'afni' directory does not yet exist.  With this command,
    the 'afni' directory is created, along with the sub-directory 'outliers'.


# ----------------------------------------------------------------------
# verify that the previous 'mkdir' command succeeded

if ( ! -d afni ) then
    echo "failure: cannot create 'afni' directory (no write permission?)"
    exit
endif

    Testing for the existence of the 'afni' directory after the previous
    mkdir command will verify that the command succeeded.  This is a good
    check to make, in case the user does not have write permission in this
    directory.



# ----------------------------------------------------------------------
# verify that an AFNI program is in the PATH

which to3d >& /dev/null
if ( $status != 0 ) then
    echo "failure: cannot find 'to3d' - is AFNI installed?"
    exit
endif

    # Unix

    The 'which to3d' command requests the shell to output the location
    (the full pathname) of the 'to3d' program.  This will fail if 'to3d'
    is not in the user's path.

    In either case (success or failure), the output should not go to
    the screen.  Send all output to the special Unix file, /dev/null,
    never to be seen again.

    The second part of this test looks at the value of the '$status'
    variable.  This is a special shell variable that stores the result
    of the previous command.  Note that it must therefore be checked
    or stored immediately, before any other commands are executed.

    If '$status' is 0, the 'which' command must have been successful.
    That means 'to3d' is in the user's path.  Otherwise, if '$status'
    is not 0, we inform the user and terminate the script.


#======================================================================
#  begin pre-processing the data
#======================================================================

    - create an AFNI dataset from the spoiled grass anatomical images
    - create an AFNI dataset from EPI run 1
    - create AFNI datasets from the other EPI runs
    - concatenate all EPI datasets into 1 AFNI EPI dataset
    - register the EPI volumes to the last one

# ----------------------------------------
# Change directories to the location of the SPGR I-files.

cd DDmb_data/SPGR_data


# ----------------------------------------------------------------------
# create the SPGR dataset, putting it in the afni directory

to3d -prefix DDSPGR -session $topdir/afni I.*

    Since the input to 'to3d' is a list of I-files, it is not necessary to
    specify information like alignment, orientation or grid spacing.  This
    information is contained in the header portion of each I-file, which is
    then followed by the actual image data.

    # Unix      (see also 'to3d -help')

    This command creates 'DDSPGR+orig.HEAD' and 'DDSPGR+orig.BRIK'.

    Note the '-session' option here.  In this way, 'to3d' knows to place
    the resulting dataset files (the HEAD and BRIK files) in the top level
    'afni' directory, which was created earlier as part of '$outlier_dir'.


# ----------------------------------------------------------------------
# put run 1 EPI data into a 3D+time AFNI dataset
# - run 1 will be the '-geomparent' for the other run data

# go to the directory containing the run data
cd ../EPI_data

to3d -session $topdir/afni -prefix DDr1                   \
     -save_outliers $outlier_dir/DDr1.outliers            \
     -orient SPR -zorigin 69 -epan                        \
     -xSLAB 118.125S-I -ySLAB 118.125P-A -zSLAB 69R-61L   \
     -time:tz 110 27 2500 alt+z 3Ds:0:0:64:64:1:"DDr1*"

    There are many options used here that were not needed for the creation
    of the anatomical dataset, specifying the geometry and orientation of
    the dataset.  The difference here is that the input files, "DDr1_*",
    do not contain any such information, where the SPGR data files, the
    I-files, do.  In fact, these files are just image files, containing
    nothing but 4096 2-byte integers (they are 64x64 voxel images).

    # Unix      (see also 'to3d -help')

    This command creates 'DDr1+orig.HEAD' and 'DDr1+orig.BRIK', and
    stores them in the top 'afni' directory, where we put the SPGR
    dataset, previously.

    With the '-save_outliers' option, 'to3d' is told to store the outlier
    information in a file under '$outlier_dir' (which was created earlier).

    Note also the quotes around 'DDr1*'.  Since an input image format
    is specified (see 'to3d -help'), starting with '3Ds:0:0:64:64:1:',
    the quotes around are filenames are actually required.  Without them,
    the shell will expand that expression,

        3Ds:0:0:64:64:1:DDr1*

    trying to find filenames that match.  But the '3Ds:...' is the image
    format information that we want to pass to 'to3d'.  Only the 'DDr1*'
    refers to the input files, and so it is put in quotes, in order to
    pass that information directly to 'to3d'.  The 'to3d' program must
    expand the file list, not the shell.



# ----------------------------------------------------------------------
# put the other run data into 3D+time AFNI datasets

foreach run ( 2 3 4 )
    to3d -geomparent $topdir/afni/DDr1+orig -session $topdir/afni         \
         -prefix DDr${run} -save_outliers $outlier_dir/DDr${run}.outliers \
         -time:tz 110 27 2500 alt+z                                       \
         3Ds:0:0:64:64:1:"DDr${run}*"
end

    Now that the geometry information has been provided for the run 1
    dataset, that dataset can be used as the geometry parent for the others.
    So for each of the other runs, use basically the same command as for
    run 1, except replace the '-orient', '-zorigin', '-epan', '-xSLAB',
    '-ySLAB' and '-zSLAB' options with '-geomparent'.

    # Unix      (see also 'to3d -help')

    The foreach command takes a variable name and then a list of values within
    parentheses.  The script will process the body of the foreach loop once
    for each value in parentheses.  Within a single loop iteration, that
    variable will take on the corresponding value.

    In this case, 'run' is initially set to the value "2".  Then the 'to3d'
    command is executed, with 3 occurrences of that variable '$run'.  So the
    first iteration will execute the command:

	to3d -geomparent $topdir/afni/DDr1+orig -session $topdir/afni   \
	     -prefix DDr2 -save_outliers $outlier_dir/DDr2.outliers     \
	     -time:tz 110 27 2500 alt+z                                 \
	     3Ds:0:0:64:64:1:"DDr2*"

    Note that $topdir will also be replaced by its value.

    Once this command is finished, the loop starts the next iteration, where
    'run' is set to the value "3".  That corresponding 'to3d' command is
    executed, 'run' is set to "4", the last 'to3d' command is executed,
    and the loop terminates.

    The body of a 'foreach' loop may contain many statements.  In this case,
    only a single 'to3d' command is used.  The end of the 'foreach' body is
    signified by the 'end' statement.


    Something new to note here is the pair of '{', '}' characters around
    our variable, '$run'.  These are used to separate the variable from
    surrounding characters.  They are not always needed.  In this case,
    they are more of a visual separator than a syntactical one.

    As an example, consider the EPI image file, DDr1_06.058 (this is for
    run 1, slice 06, time point 058).  Suppose variables were set for
    '$run', '$slice', and '$time', and for some reason, it was desired to
    print out the name of this file using them.  One might try:

	echo "file is: DDr$run_$slice.$time"

    However, this would actually result in the error message:

    	run_: Undefined variable.

    Since the '_' character is a valid variable name character, the shell
    interprets all of '$run_' as the variable.  In order to get syntactical
    separation, it would be necessary to use '{' and '}':

	echo "file is: DDr${run}_$slice.$time"

    To get better visual separation, making it more clear, one might use:

	echo "file is: DDr${run}_${slice}.${time}"



# ----------------------------------------------------------------------
# concatenate all 4 runs (time points 2 through 109) into a single dataset
# - this makes a dataset with 432 = 108*4 time points (0 through 431)

cd $topdir/afni

3dTcat -prefix DDrall                            \
       DDr1+orig'[2..109]' DDr2+orig'[2..109]'   \
       DDr3+orig'[2..109]' DDr4+orig'[2..109]'

    Go to the to top level 'afni' directory, where all of the datasets are
    now located (via '-session' options).  This includes the anatomical
    dataset, and all 4 run datasets.

    Put all the runs into a single 3d+time dataset.  Sub-bricks 0 and 1 are
    intentionally omitted from each run, leaving 108 sub-bricks from each
    of the 4 run datasets.  The program '3dTcat' will simply concatenate
    the input datasets, resulting in a dataset with 432 time points, called
    'DDrall+orig'.  The data is unaltered.

    # Unix      (see also '3dTcat -help')

    Note that the input datasets have sub-brick selectors in quotes.  These
    are to inform 3dTcat that, from 'DDr1+orig' for instance, the sub-bricks
    from 2 through 109 should be copied into the resulting dataset, skipping
    the first sub-bricks 0 and 1.

    Recall that such quotes are necessary any time it is desired to prevent
    the shell from processing the special characters like '[', ']' or '*',
    as in the 'to3d' command, above.

    # review the 3 types of quotes that have been used in this script, so far

    The backward quotes, or ``, were used on the command assigning topdir.
    Those quotes, and the command they contain, get replaced with the output
    from that command, before the rest of the statement gets evaluated.

    The single quotes, or '', were used in this '3dTcat' command.  Single
    quotes prevent the shell from evaluating any special character contained
    between them.

    The double quotes, or "", were used at the end of the 'to3d' command
    within the 'foreach' loop.  Double quotes are the same as single quotes,
    except that they allow variable substitutions between them.  At the end
    of the 'to3d' command is the line:

	"DDr${run}*"

    This is a great example.  Between the quotes can be found '${run}' and
    '*'.  Since '${run}' represents a variable substitution, the shell 
    replaces this with the corresponding value (either "2", "3" or "4",
    depending on the iteration in the 'foreach' loop).  However the '*'
    character does not get evaluated, not within those quotes.


# ----------------------------------------------------------------------
# register each of the 432 sub-bricks (time points) to the last one

3dvolreg -dfile DDrallvrout -base 431 -prefix DDrallvr DDrall+orig

         (see also '3dvolreg -help')

    All of the run data is now in 'DDrall+orig', after the use of '3dTcat'.
    Since the anatomical data was collected at the end of the scanning
    session, after the last run, the data at the end of the final run
    should be closely aligned with the anatomical data.

    One thing to realize about 3-D volume registration is that it really
    is a 3-D registration.  In order to align one volume on top of another,
    it can be rotated or shifted in any way.  But it is merely moving one
    3-D object around, to make it most closely aligned with another.

    The DDrall+orig dataset has 432 sub-bricks, each of which is a 3-D
    volume of the brain, taken within a single TR (2.5 seconds).  This is
    basically considered a single time point, a "snapshot" of the brain.
    Each of the sub-bricks will be registered to the final one, number
    431 (starting from 0).  That is to say, each 3-D volume "snapshot"
    will be shifted and rotated (by very small amounts) so that it is most
    closely aligned with sub-brick 431.

    The resulting dataset, where all sub-bricks are aligned, is 'DDrallvr',
    as is specified by the '-prefix' option.  One other output file is
    created here, called 'DDrallvrout', which contains the resulting
    parameters for every sub-brick registration.


#======================================================================
# We have our AFNI data, now analyze it (produce functional data bricks).
#======================================================================

# ----------------------------------------------------------------------
# make a directory to store the various text files

mkdir $topdir/afni/regressors

    This is where the stimulus and regressor files will be stored
    and processed.


# ----------------------------------------------------------------------
# go to the directory containing the permanent stimulus files

cd $topdir/stim_files

    This directory holds the basic stimulus files.  There are stimulus
    files for each the 4 stimulus types, for each of the 4 runs (so 16
    files).

    For each stimulus type, create 1 long stimulus file, spanning all
    4 runs.  Except for the fact that the baseline may change from 1 run
    to the next, the intention is to treat these 4 runs as 1 long run.

    The long stimulus file (for each of the 4 stimulus types) will be
    stored in the 'regressors' directory.


# ----------------------------------------------------------------------
# begin foreach loop, over the 4 stimulus types

foreach stim_type ( a t h l )

    # Unix

    While it is not necessary to indent the body of a 'foreach' loop, it
    is highly recommended, simply to make the script easier to read.


    # ----------------------------------------
    # For this stimulus type, concatenate the stimulus timings
    # from all 4 runs into a single file.

    cat scan1$stim_type.txt scan2$stim_type.txt               \
        scan3$stim_type.txt scan4$stim_type.txt               \
            > $topdir/afni/regressors/scan1to4$stim_type.1D

    # Unix

    This 'cat' command is used to display the stimulus files sequentially.
    Normally this data would then appear on the terminal window, except
    that the output has been redirected to another file, using the '>'
    symbol.  So the 'scan1to4X.1D' file will be created, under the 
    'regressors' directory, where 'X' is the current stimulus type.


    # ----------------------------------------
    # also, can do:
    #     cat scan[1-4]$stim_type.txt \
    #           > $topdir/afni/regressors/scan1to4$stim_type.1D

    # Unix

    This alternate, commented command matches the one above.  This command
    shows an example of how to use '[' and ']', and what they mean to the
    shell.  Note that these characters are not contained within quotes, thus
    allowing the shell to interpret them.  The shell replaces '[]', and
    whatever list is between them, with any single character in that list.

    Suppose '$stim_type' is 'h' in this case.  Then the command (ignoring
    redirection) starts as 'cat scan[1-4]h.txt'.  The shell then searches
    for any filename that matches this expression, meaning any name
    starting with 'scan', followed by a single matching character (either
    '1', '2', '3' or '4'), followed by 'h.txt'.  So the expression '[1-4]'
    is replaced by any single character between '1' and '4'.

    Expansion examples:

        [1-4]      : one of '1', '2', '3', or '4'
        [1234]     : one of '1', '2', '3', or '4'
        [abc]      : one of 'a', 'b', or 'c'
        [a-c]      : one of 'a', 'b', or 'c'
        [qf7@]     : one of 'q', 'f', '7', or '@'



# ----------------------------------------
# end foreach loop

end

    # Unix

    The 'end' statement terminates the 'foreach' loop.  At this point, the
    shell assigns 'stim_type' the next value in the control list (of 'a',
    't', 'h' and 'l') and starts the loop body over.

    If the entire list has been processed (i.e., if 'l' was the last
    'stim_type'), then the shell moves on, past the 'end' statement.


# ----------------------------------------------------------------------
# create ideal reference function, one for each stimulus type

cd $topdir/afni/regressors

    Recall that the 4 scan1to4X.1D files were placed under 'regressors'.

# ----------------------------------------
# create 'ideal' reference functions from the stimulus timing files

waver -dt 2.5 -GAM -input scan1to4a.1D > scan1to4a_hrf.1D
waver -dt 2.5 -GAM -input scan1to4t.1D > scan1to4t_hrf.1D
waver -dt 2.5 -GAM -input scan1to4h.1D > scan1to4h_hrf.1D
waver -dt 2.5 -GAM -input scan1to4l.1D > scan1to4l_hrf.1D

    #  Unix    (see also 'waver -help')

    It wouldn't save much, but this could be done with a 'foreach' loop:
    
    foreach ST ( a t h l )
	waver -dt 2.5 -GAM -input scan1to4${ST}.1D > scan1to4${ST}_hrf.1D
    end


# ----------------------------------------------------------------------
# create a functional dataset, according to the stimulus timings
# and the contrasts

# ----------------------------------------
# go to the 'afni' directory, where all of the AFNI datasets are being put

cd $topdir/afni

    # Unix

    This is the same as 'cd ..', though it is more descriptive.

# ----------------------------------------
# actually create a functional dataset

3dDeconvolve -xout -input DDrallvr+orig -num_stimts 4                   \
    -stim_file 1 regressors/scan1to4a_hrf.1D  -stim_label 1 Actions     \
    -stim_file 2 regressors/scan1to4t_hrf.1D  -stim_label 2 Tool        \
    -stim_file 3 regressors/scan1to4h_hrf.1D  -stim_label 3 HCMS        \
    -stim_file 4 regressors/scan1to4l_hrf.1D  -stim_label 4 LCMS        \
    -full_first -fout -tout -concat $topdir/contrasts/runs.1D           \
    -glt 1 $topdir/contrasts/DDcontrv1.txt -glt_label 1 AvsT            \
    -glt 1 $topdir/contrasts/DDcontrv2.txt -glt_label 2 HvsL            \
    -glt 1 $topdir/contrasts/DDcontrv3.txt -glt_label 3 ATvsHL          \
    -bucket DDrallvrMRv1

    # Unix      (see also '3dDeconvolve -help')

    Many of the input files for this command reside in different directories.
    This command is being run from the top 'afni' directory, under which is
    the regressors directory, which contains each of the ideal reference
    function files, one per stimulus type.

    The 3 contrast files are located under the 'contrasts' directory, along
    with 'runs.1D'.  Since this 'contrasts' directory is at the same level
    as the 'afni' directory, each of the '$topdir/contrasts' directories
    in this '3dDeconvolve' command could be replaced with '../contrasts',
    without affecting the script.