3dAllineate


Usage: 3dAllineate [options] sourcedataset

--------------------------------------------------------------------------
      Program to align one dataset (the 'source') to a 'base'
      dataset, using an affine (matrix) transformation of space.
--------------------------------------------------------------------------

--------------------------------------------------------------------------
    ***** Please check your results visually, or at some point  *****
    ***** in time you will have bad results and not know it :-( *****
    *****                                                       *****
    ***** No method for 3D image alignment, however tested it   *****
    ***** was, can be relied upon 100% of the time, and anyone  *****
    ***** who tells you otherwise is a madman or is a liar!!!!  *****
    *****                                                       *****
    ***** In particular, if you are aligning two datasets with  *****
    ***** significantly different spatial coverage (e.g.,       *****
    ***** -source = whole head T1w and -base = MNI template),   *****
    ***** the be careful to check the results. In such a case,  *****
    ***** using '-twobest MAX' should increase the chance of    *****
    ***** getting a good alignment (at the cost of CPU time).   *****
    *****                                                       *****
    ***** Furthermore, don't EVER think that "I have so much    *****
    ***** data that a few errors will not matter"!!!!           *****
--------------------------------------------------------------------------

* Options (lots of them!) are available to control:
 ++ How the matching between the source and the base is computed
    (i.e., the 'cost functional' measuring image mismatch).
 ++ How the resliced source is interpolated to the base space.
 ++ The complexity of the spatial transformation ('warp') used.
 ++ And many many technical options to control the process in detail,
    if you know what you are doing (or just like to fool around).

* This program is a generalization of and improvement on the older
    software 3dWarpDrive.

* For nonlinear transformations, see progam 3dQwarp.

* 3dAllineate can also be used to apply a pre-computed matrix to a dataset
  to produce the transformed output. In this mode of operation, it just
  skips the alignment process, whose function is to compute the matrix,
  and instead it reads the matrix in, computes the output dataset,
  writes it out, and stops.

* If you are curious about the stepwise process used, see the section below
  titled 'SUMMARY of the Default Allineation Process'.

=====----------------------------------------------------------------------
NOTES: For most 3D image registration purposes, we now recommend that you
=====  use Daniel Glen's script align_epi_anat.py (which, despite its name,
       can do many more registration problems than EPI-to-T1-weighted).
  -->> In particular, using 3dAllineate with the 'lpc' cost functional
       (to align EPI and T1-weighted volumes) requires using a '-weight'
       volume to get good results, and the align_epi_anat.py script will
       automagically generate such a weight dataset that works well for
       EPI-to-structural alignment.
  -->> This script can also be used for other alignment purposes, such
       as T1-weighted alignment between field strengths using the
       '-lpa' cost functional.  Investigate align_epi_anat.py to
       see if it will do what you need -- you might make your life
       a little easier and nicer and happier and more tranquil.
  -->> Also, if/when you ask for registration help on the AFNI
       message board, we'll probably start by recommending that you
       try align_epi_anat.py if you haven't already done so.
  -->> For aligning EPI and T1-weighted volumes, we have found that
       using a flip angle of 50-60 degrees for the EPI works better than
       a flip angle of 90 degrees.  The reason is that there is more
       internal contrast in the EPI data when the flip angle is smaller,
       so the registration has some image structure to work with.  With
       the 90 degree flip angle, there is so little internal contrast in
       the EPI dataset that the alignment process ends up being just
       trying to match brain outlines -- which doesn't always give accurate
       results: see http://dx.doi.org/10.1016/j.neuroimage.2008.09.037
  -->> Although the total MRI signal is reduced at a smaller flip angle,
       there is little or no loss in FMRI/BOLD information, since the bulk
       of the time series 'noise' is from physiological fluctuation signals,
       which are also reduced by the lower flip angle -- for more details,
       see http://dx.doi.org/10.1016/j.neuroimage.2010.11.020
---------------------------------------------------------------------------
  **** New (Summer 2013) program 3dQwarp is available to do nonlinear  ****
  ***  alignment between a base and source dataset, including the use   ***
  **   of 3dAllineate for the preliminary affine alignment.  If you are  **
  *    interested, see the output of '3dQwarp -help' for the details.     *
---------------------------------------------------------------------------

COMMAND LINE OPTIONS:
====================
 -base bbb   = Set the base dataset to be the #0 sub-brick of 'bbb'.
               If no -base option is given, then the base volume is
               taken to be the #0 sub-brick of the source dataset.
               (Base must be stored as floats, shorts, or bytes.)
            ** -base is not needed if you are just applying a given
               transformation to the -source dataset to produce
               the output, using -1Dmatrix_apply or -1Dparam_apply
            ** Unless you use the -master option, the aligned
               output dataset will be stored on the same 3D grid
               as the -base dataset.

 -source ttt = Read the source dataset from 'ttt'.  If no -source
   *OR*        (or -input) option is given, then the source dataset
 -input ttt    is the last argument on the command line.
               (Source must be stored as floats, shorts, or bytes.)
            ** This is the dataset to be transformed, to match the
               -base dataset, or directly with one of the options
               -1Dmatrix_apply or -1Dparam_apply
            ** 3dAllineate can register 2D datasets (single slice),
               but both the base and source must be 2D -- you cannot
               use this program to register a 2D slice into a 3D volume!
               -- However, the 'lpc' and 'lpa' cost functionals do not
                  work properly with 2D images, as they are designed
                  around local 3D neighborhoods and that code has not
                  been patched to work with 2D neighborhoods :(
               -- You can input .jpg files as 2D 'datasets', register
                  them with 3dAllineate, and write the result back out
                  using a prefix that ends in '.jpg'; HOWEVER, the color
                  information will not be used in the registration, as
                  this program was written to deal with monochrome medical
                  datasets. At the end, if the source was RGB (color), then
                  the output will be also be RGB, and then a color .jpg
                  can be output.
               -- The above remarks also apply to aligning 3D RGB datasets:
                  it will be done using only the 3D volumes converted to
                  grayscale, but the final output will be the source
                  RGB dataset transformed to the (hopefully) aligned grid.
                 * However, I've never tested aligning 3D color datasets;
                   you can be the first one ever!
            ** See the script @2dwarper.Allin for an example of using
               3dAllineate to do slice-by-slice nonlinear warping to
               align 3D volumes distorted by time-dependent magnetic
               field inhomogeneities.

 ** NOTA BENE: The base and source dataset do NOT have to be defined **
 ** [that's]   on the same 3D grids; the alignment process uses the  **
 ** [Latin ]   coordinate systems defined in the dataset headers to  **
 ** [  for ]   make the match between spatial locations, rather than **
 ** [ NOTE ]   matching the 2 datasets on a voxel-by-voxel basis     **
 ** [ WELL ]   (as 3dvolreg and 3dWarpDrive do).                     **
 **       -->> However, this coordinate-based matching requires that **
 **            image volumes be defined on roughly the same patch of **
 **            of (x,y,z) space, in order to find a decent starting  **
 **            point for the transformation.  You might need to use  **
 **            the script @Align_Centers to do this, if the 3D       **
 **            spaces occupied by the images do not overlap much.    **
 **       -->> Or the '-cmass' option to this program might be       **
 **            sufficient to solve this problem, maybe, with luck.   **
 **            (Another reason why you should use align_epi_anat.py) **
 **       -->> If the coordinate system in the dataset headers is    **
 **            WRONG, then 3dAllineate will probably not work well!  **
 **            And I say this because we have seen this in several   **
 **            datasets downloaded from online archives.             **

 -prefix ppp = Output the resulting dataset to file 'ppp'.  If this
   *OR*        option is NOT given, no dataset will be output!  The
 -out ppp      transformation matrix to align the source to the base will
               be estimated, but not applied.  You can save the matrix
               for later use using the '-1Dmatrix_save' option.
        *N.B.: By default, the new dataset is computed on the grid of the
                base dataset; see the '-master' and/or the '-mast_dxyz'
                options to change this grid.
        *N.B.: If 'ppp' is 'NULL', then no output dataset will be produced.
                This option is for compatibility with 3dvolreg.

 -floatize   = Write result dataset as floats.  Internal calculations
 -float        are all done on float copies of the input datasets.
               [Default=convert output dataset to data format of  ]
               [        source dataset; if the source dataset was ]
               [        shorts with a scale factor, then the new  ]
               [        dataset will get a scale factor as well;  ]
               [        if the source dataset was shorts with no  ]
               [        scale factor, the result will be unscaled.]

 -1Dparam_save ff   = Save the warp parameters in ASCII (.1D) format into
                      file 'ff' (1 row per sub-brick in source).
                    * A historical synonym for this option is '-1Dfile'.
                    * At the top of the saved 1D file is a #comment line
                      listing the names of the parameters; those parameters
                      that are fixed (e.g., via '-parfix') will be marked
                      by having their symbolic names end in the '$' character.
                      You can use '1dcat -nonfixed' to remove these columns
                      from the 1D file if you just want to further process the
                      varying parameters somehow (e.g., 1dsvd).
                    * However, the '-1Dparam_apply' option requires the
                      full list of parameters, including those that were
                      fixed, in order to work properly!

 -1Dparam_apply aa  = Read warp parameters from file 'aa', apply them to
                      the source dataset, and produce a new dataset.
                      (Must also use the '-prefix' option for this to work!  )
                      (In this mode of operation, there is no optimization of)
                      (the cost functional by changing the warp parameters;  )
                      (previously computed parameters are applied directly.  )
               *N.B.: If you use -1Dparam_apply, you may also want to use
                       -master to control the grid on which the new
                       dataset is written -- the base dataset from the
                       original 3dAllineate run would be a good possibility.
                       Otherwise, the new dataset will be written out on the
                       3D grid coverage of the source dataset, and this
                       might result in clipping off part of the image.
               *N.B.: Each row in the 'aa' file contains the parameters for
                       transforming one sub-brick in the source dataset.
                       If there are more sub-bricks in the source dataset
                       than there are rows in the 'aa' file, then the last
                       row is used repeatedly.
               *N.B.: A trick to use 3dAllineate to resample a dataset to
                       a finer grid spacing:
                         3dAllineate -input dataset+orig         \
                                     -master template+orig       \
                                     -prefix newdataset          \
                                     -final wsinc5               \
                                     -1Dparam_apply '1D: 12@0'\'
                       Here, the identity transformation is specified
                       by giving all 12 affine parameters as 0 (note
                       the extra \' at the end of the '1D: 12@0' input!).
                     ** You can also use the word 'IDENTITY' in place of
                        '1D: 12@0'\' (to indicate the identity transformation).
              **N.B.: Some expert options for modifying how the wsinc5
                       method works are described far below, if you use
                       '-HELP' instead of '-help'.
            ****N.B.: The interpolation method used to produce a dataset
                       is always given via the '-final' option, NOT via
                       '-interp'.  If you forget this and use '-interp'
                       along with one of the 'apply' options, this program
                       will chastise you (gently) and change '-final'
                       to match what the '-interp' input.

 -1Dmatrix_save ff  = Save the transformation matrix for each sub-brick into
                      file 'ff' (1 row per sub-brick in the source dataset).
                      If 'ff' does NOT end in '.1D', then the program will
                      append '.aff12.1D' to 'ff' to make the output filename.
               *N.B.: This matrix is the coordinate transformation from base
                       to source DICOM coordinates. In other terms:
                          Xin = Xsource = M Xout = M Xbase
                                   or
                          Xout = Xbase = inv(M) Xin = inv(M) Xsource
                       where Xin or Xsource is the 4x1 coordinates of a
                       location in the input volume. Xout is the
                       coordinate of that same location in the output volume.
                       Xbase is the coordinate of the corresponding location
                       in the base dataset. M is ff augmented by a 4th row of
                       [0 0 0 1], X. is an augmented column vector [x,y,z,1]'
                       To get the inverse matrix inv(M)
                       (source to base), use the cat_matvec program, as in
                         cat_matvec fred.aff12.1D -I

 -1Dmatrix_apply aa = Use the matrices in file 'aa' to define the spatial
                      transformations to be applied.  Also see program
                      cat_matvec for ways to manipulate these matrix files.
               *N.B.: You probably want to use either -base or -master
                      with either *_apply option, so that the coordinate
                      system that the matrix refers to is correctly loaded.
                     ** You can also use the word 'IDENTITY' in place of a
                        filename to indicate the identity transformation --
                        presumably for the purpose of resampling the source
                        dataset to a new grid.

  * The -1Dmatrix_* options can be used to save and re-use the transformation *
  * matrices.  In combination with the program cat_matvec, which can multiply *
  * saved transformation matrices, you can also adjust these matrices to      *
  * other alignments. These matrices can also be combined with nonlinear      *
  * warps (from 3dQwarp) using programs 3dNwarpApply or 3dNwarpCat.           *

  * The script 'align_epi_anat.py' uses 3dAllineate and 3dvolreg to align EPI *
  * datasets to T1-weighted anatomical datasets, using saved matrices between *
  * the two programs.  This script is our currently recommended method for    *
  * doing such intra-subject alignments.                                      *

 -cost ccc   = Defines the 'cost' function that defines the matching
               between the source and the base; 'ccc' is one of
                ls   *OR*  leastsq         = Least Squares [Pearson Correlation]
                mi   *OR*  mutualinfo      = Mutual Information [H(b)+H(s)-H(b,s)]
                crM  *OR*  corratio_mul    = Correlation Ratio (Symmetrized*)
                nmi  *OR*  norm_mutualinfo = Normalized MI [H(b,s)/(H(b)+H(s))]
                hel  *OR*  hellinger       = Hellinger metric
                crA  *OR*  corratio_add    = Correlation Ratio (Symmetrized+)
                crU  *OR*  corratio_uns    = Correlation Ratio (Unsym)
                lpc  *OR*  localPcorSigned = Local Pearson Correlation Signed
                lpa  *OR*  localPcorAbs    = Local Pearson Correlation Abs
                lpc+ *OR*  localPcor+Others= Local Pearson Signed + Others
                lpa+ *OR*  localPcorAbs+Others= Local Pearson Abs + Others
               You can also specify the cost functional using an option
               of the form '-mi' rather than '-cost mi', if you like
               to keep things terse and cryptic (as I do).
               [Default == '-hel' (for no good reason, but it sounds nice).]
               **NB** See more below about lpa and lpc, which are typically
                      what we would recommend as first-choice cost functions
                      now:
                        lpa if you have similar contrast vols to align;
                        lpc if you have *non*similar contrast vols to align!

 -interp iii = Defines interpolation method to use during matching
               process, where 'iii' is one of
                 NN      *OR* nearestneighbour *OR nearestneighbor
                 linear  *OR* trilinear
                 cubic   *OR* tricubic
                 quintic *OR* triquintic
               Using '-NN' instead of '-interp NN' is allowed (e.g.).
               Note that using cubic or quintic interpolation during
               the matching process will slow the program down a lot.
               Use '-final' to affect the interpolation method used
               to produce the output dataset, once the final registration
               parameters are determined.  [Default method == 'linear'.]
            ** N.B.: Linear interpolation is used during the coarse
                     alignment pass; the selection here only affects
                     the interpolation method used during the second
                     (fine) alignment pass.
            ** N.B.: '-interp' does NOT define the final method used
                     to produce the output dataset as warped from the
                     input dataset.  If you want to do that, use '-final'.

 -final iii  = Defines the interpolation mode used to create the
               output dataset.  [Default == 'cubic']
            ** N.B.: If you are applying a transformation to an
                       integer-valued dataset (such as an atlas),
                       then you should use '-final NN' to avoid
                       interpolation of the integer labels.
            ** N.B.: For '-final' ONLY, you can use 'wsinc5' to specify
                       that the final interpolation be done using a
                       weighted sinc interpolation method.  This method
                       is so SLOW that you aren't allowed to use it for
                       the registration itself.
                  ++ wsinc5 interpolation is highly accurate and should
                       reduce the smoothing artifacts from lower
                       order interpolation methods (which are most
                       visible if you interpolate an EPI time series
                       to high resolution and then make an image of
                       the voxel-wise variance).
                  ++ On my Intel-based Mac, it takes about 2.5 s to do
                       wsinc5 interpolation, per 1 million voxels output.
                       For comparison, quintic interpolation takes about
                       0.3 s per 1 million voxels: 8 times faster than wsinc5.
                  ++ The '5' refers to the width of the sinc interpolation
                       weights: plus/minus 5 grid points in each direction;
                       this is a tensor product interpolation, for speed.

TECHNICAL OPTIONS (used for fine control of the program):
=================
 -nmatch nnn = Use at most 'nnn' scattered points to match the
               datasets.  The smaller nnn is, the faster the matching
               algorithm will run; however, accuracy may be bad if
               nnn is too small.  If you end the 'nnn' value with the
               '%' character, then that percentage of the base's
               voxels will be used.
               [Default == 47% of voxels in the weight mask]

 -nopad      = Do not use zero-padding on the base image.
               (I cannot think of a good reason to use this option.)
               [Default == zero-pad, if needed; -verb shows how much]

 -zclip      = Replace negative values in the input datasets (source & base)
 -noneg        with zero.  The intent is to clip off a small set of negative
               values that may arise when using 3dresample (say) with
               cubic interpolation.

 -conv mmm   = Convergence test is set to 'mmm' millimeters.
               This doesn't mean that the results will be accurate
               to 'mmm' millimeters!  It just means that the program
               stops trying to improve the alignment when the optimizer
               (NEWUOA) reports it has narrowed the search radius
               down to this level.
               * To set this value to the smallest allowable, use '-conv 0'.
               * A coarser value for 'quick-and-dirty' alignment is 0.05.

 -verb       = Print out verbose progress reports.
               [Using '-VERB' will give even more prolix reports :]
 -quiet      = Don't print out verbose stuff. (But WHY?)

 -usetemp    = Write intermediate stuff to disk, to economize on RAM.
               Using this will slow the program down, but may make it
               possible to register datasets that need lots of space.
       **N.B.: Temporary files are written to the directory given
               in environment variable TMPDIR, or in /tmp, or in ./
               (preference in that order).  If the program crashes,
               these files are named TIM_somethingrandom, and you
               may have to delete them manually. (TIM=Temporary IMage)
       **N.B.: If the program fails with a 'malloc failure' type of
               message, then try '-usetemp' (malloc=memory allocator).
             * If the program just stops with a message 'killed', that
               means the operating system (Unix/Linux) stopped the
               program, which almost always is due to the system running
               low on memory -- so it starts killing programs to save itself.

 -nousetemp  = Don't use temporary workspace on disk [the default].

 -check hhh  = After cost functional optimization is done, start at the
               final parameters and RE-optimize using the new cost
               function 'hhh'.  If the results are too different, a
               warning message will be printed.  However, the final
               parameters from the original optimization will be
               used to create the output dataset. Using '-check'
               increases the CPU time, but can help you feel sure
               that the alignment process did not go wild and crazy.
               [Default == no check == don't worry, be happy!]
       **N.B.: You can put more than one function after '-check', as in
                 -nmi -check mi hel crU crM
               to register with Normalized Mutual Information, and
               then check the results against 4 other cost functionals.
       **N.B.: On the other hand, some cost functionals give better
               results than others for specific problems, and so
               a warning that 'mi' was significantly different than
               'hel' might not actually mean anything useful (e.g.).

 ** PARAMETERS THAT AFFECT THE COST OPTIMIZATION STRATEGY **

 -onepass    = Use only the refining pass -- do not try a coarse
               resolution pass first.  Useful if you know that only
               SMALL amounts of image alignment are needed.
               [The default is to use both passes.]

 -twopass    = Use a two pass alignment strategy, first searching for
               a large rotation+shift and then refining the alignment.
               [Two passes are used by default for the first sub-brick]
               [in the source dataset, and then one pass for the others.]
               ['-twopass' will do two passes for ALL source sub-bricks.]
            *** The first (coarse) pass is relatively slow, as it tries
                 to search a large volume of parameter (rotations+shifts)
                 space for initial guesses at the alignment transformation.
              * A lot of these initial guesses are kept and checked to
                 see which ones lead to good starting points for the
                 further refinement.
              * The winners of this competition are then passed to the
                 '-twobest' (infra) successive optimization passes.
              * The ultimate winner of THAT stage is what starts
                 the second (fine) pass alignment. Usually, this starting
                 point is so good that the fine pass optimization does
                 not provide a lot of improvement; that is, most of the
                 run time ends up in coarse pass with its multiple stages.
              * All of these stages are intended to help the program avoid
                 stopping at a 'false' minimum in the cost functional.
                 They were added to the software as we gathered experience
                 with difficult 3D alignment problems. The combination of
                 multiple stages of partial optimization of multiple
                 parameter candidates makes the coarse pass slow, but also
                 makes it (usually) work well.

 -twoblur rr = Set the blurring radius for the first pass to 'rr'
               millimeters.  [Default == 11 mm]
       **N.B.: You may want to change this from the default if
               your voxels are unusually small or unusually large
               (e.g., outside the range 1-4 mm along each axis).

 -twofirst   = Use -twopass on the first image to be registered, and
               then on all subsequent images from the source dataset,
               use results from the first image's coarse pass to start
               the fine pass.
               (Useful when there may be large motions between the   )
               (source and the base, but only small motions within   )
               (the source dataset itself; since the coarse pass can )
               (be slow, doing it only once makes sense in this case.)
       **N.B.: [-twofirst is on by default; '-twopass' turns it off.]

 -twobest bb = In the coarse pass, use the best 'bb' set of initial
               points to search for the starting point for the fine
               pass.  If bb==0, then no search is made for the best
               starting point, and the identity transformation is
               used as the starting point.  [Default=5; min=0 max=29]
       **N.B.: Setting bb=0 will make things run faster, but less reliably.
               Setting bb = 'MAX' will make it be the max allowed value.

 -fineblur x = Set the blurring radius to use in the fine resolution
               pass to 'x' mm.  A small amount (1-2 mm?) of blurring at
               the fine step may help with convergence, if there is
               some problem, especially if the base volume is very noisy.
               [Default == 0 mm = no blurring at the final alignment pass]

   **NOTES ON
   **STRATEGY: * If you expect only small-ish (< 2 voxels?) image movement,
                 then using '-onepass' or '-twobest 0' makes sense.
               * If you expect large-ish image movements, then do not
                 use '-onepass' or '-twobest 0'; the purpose of the
                 '-twobest' parameter is to search for large initial
                 rotations/shifts with which to start the coarse
                 optimization round.
               * If you have multiple sub-bricks in the source dataset,
                 then the default '-twofirst' makes sense if you don't expect
                 large movements WITHIN the source, but expect large motions
                 between the source and base.
               * '-twopass' re-starts the alignment process for each sub-brick
                 in the source dataset -- this option can be time consuming,
                 and is really intended to be used when you might expect large
                 movements between sub-bricks; for example, when the different
                 volumes are gathered on different days.  For most purposes,
                 '-twofirst' (the default process) will be adequate and faster,
                 when operating on multi-volume source datasets.

 -cmass        = Use the center-of-mass calculation to determin an initial shift
                   [This option is OFF by default]
                 can be given as cmass+a, cmass+xy, cmass+yz, cmass+xz
                 where +a means to try determine automatically in which
                 direction the data is partial by looking for a too large shift
                 If given in the form '-cmass+xy' (for example), means to
                 do the CoM calculation in the x- and y-directions, but
                 not the z-direction.
               * MY OPINION: This option is REALLY useful in most cases.
                             However, if you only have partial coverage in
                             the -source dataset, you will need to use
                             one of the '+' additions to restrict the
                             use of the CoM limits.

 -nocmass      = Don't use the center-of-mass calculation. [The default]
                  (You would not want to use the C-o-M calculation if the  )
                  (source sub-bricks have very different spatial locations,)
                  (since the source C-o-M is calculated from all sub-bricks)

 **EXAMPLE: You have a limited coverage set of axial EPI slices you want to
            register into a larger head volume (after 3dSkullStrip, of course).
            In this case, '-cmass+xy' makes sense, allowing CoM adjustment
            along the x = R-L and y = A-P directions, but not along the
            z = I-S direction, since the EPI doesn't cover the whole brain
            along that axis.

 -autoweight = Compute a weight function using the 3dAutomask
               algorithm plus some blurring of the base image.
       **N.B.: '-autoweight+100' means to zero out all voxels
                 with values below 100 before computing the weight.
               '-autoweight**1.5' means to compute the autoweight
                 and then raise it to the 1.5-th power (e.g., to
                 increase the weight of high-intensity regions).
               These two processing steps can be combined, as in
                 '-autoweight+100**1.5'
               ** Note that '**' must be enclosed in quotes;
                  otherwise, the shell will treat it as a wildcard
                  and you will get an error message before 3dAllineate
                  even starts!!
               ** UPDATE: one can now use '^' for power notation, to
                  avoid needing to enclose the string in quotes.
       **N.B.: Some cost functionals do not allow -autoweight, and
               will use -automask instead.  A warning message
               will be printed if you run into this situation.
               If a clip level '+xxx' is appended to '-autoweight',
               then the conversion into '-automask' will NOT happen.
               Thus, using a small positive '+xxx' can be used trick
               -autoweight into working on any cost functional.

 -automask   = Compute a mask function, which is like -autoweight,
               but the weight for a voxel is set to either 0 or 1.
       **N.B.: '-automask+3' means to compute the mask function, and
               then dilate it outwards by 3 voxels (e.g.).
               ** Note that '+' means something very different
                  for '-automask' and '-autoweight'!!

 -autobox    = Expand the -automask function to enclose a rectangular
               box that holds the irregular mask.
       **N.B.: This is the default mode of operation!
               For intra-modality registration, '-autoweight' may be better!
             * If the cost functional is 'ls', then '-autoweight' will be
               the default, instead of '-autobox'.

 -nomask     = Don't compute the autoweight/mask; if -weight is not
               also used, then every voxel will be counted equally.

 -weight www = Set the weighting for each voxel in the base dataset;
               larger weights mean that voxel counts more in the cost
               function.
       **N.B.: The weight dataset must be defined on the same grid as
               the base dataset.
       **N.B.: Even if a method does not allow -autoweight, you CAN
               use a weight dataset that is not 0/1 valued.  The
               risk is yours, of course (!*! as always in AFNI !*!).

 -wtprefix p = Write the weight volume to disk as a dataset with
               prefix name 'p'.  Used with '-autoweight/mask', this option
               lets you see what voxels were important in the algorithm.

 -emask ee   = This option lets you specify a mask of voxels to EXCLUDE from
               the analysis. The voxels where the dataset 'ee' is nonzero
               will not be included (i.e., their weights will be set to zero).
             * Like all the weight options, it applies in the base image
               coordinate system.
            ** Like all the weight options, it means nothing if you are using
               one of the 'apply' options.

    Method  Allows -autoweight
    ------  ------------------
     ls     YES
     mi     NO
     crM    YES
     nmi    NO
     hel    NO
     crA    YES
     crU    YES
     lpc    YES
     lpa    YES
     lpc+   YES
     lpa+   YES

 -source_mask sss = Mask the source (input) dataset, using 'sss'.
 -source_automask = Automatically mask the source dataset.
                      [By default, all voxels in the source]
                      [dataset are used in the matching.   ]
            **N.B.: You can also use '-source_automask+3' to dilate
                    the default source automask outward by 3 voxels.

 -warp xxx   = Set the warp type to 'xxx', which is one of
                 shift_only         *OR* sho =  3 parameters
                 shift_rotate       *OR* shr =  6 parameters
                 shift_rotate_scale *OR* srs =  9 parameters
                 affine_general     *OR* aff = 12 parameters
               [Default = affine_general, which includes image]
               [      shifts, rotations, scaling, and shearing]
             * MY OPINION: Shearing is usually unimportant, so
                            you can omit it if you want: '-warp srs'.
                           But it doesn't hurt to keep shearing,
                            except for a little extra CPU time.
                           On the other hand, scaling is often
                            important, so should not be omitted.

 -warpfreeze = Freeze the non-rigid body parameters (those past #6)
               after doing the first sub-brick.  Subsequent volumes
               will have the same spatial distortions as sub-brick #0,
               plus rigid body motions only.
             * MY OPINION: This option is almost useless.

 -replacebase   = If the source has more than one sub-brick, and this
                  option is turned on, then after the #0 sub-brick is
                  aligned to the base, the aligned #0 sub-brick is used
                  as the base image for subsequent source sub-bricks.
                * MY OPINION: This option is almost useless.

 -replacemeth m = After sub-brick #0 is aligned, switch to method 'm'
                  for later sub-bricks.  For use with '-replacebase'.
                * MY OPINION: This option is almost useless.

 -EPI        = Treat the source dataset as being composed of warped
               EPI slices, and the base as comprising anatomically
               'true' images.  Only phase-encoding direction image
               shearing and scaling will be allowed with this option.
       **N.B.: For most people, the base dataset will be a 3dSkullStrip-ed
               T1-weighted anatomy (MPRAGE or SPGR).  If you don't remove
               the skull first, the EPI images (which have little skull
               visible due to fat-suppression) might expand to fit EPI
               brain over T1-weighted skull.
       **N.B.: Usually, EPI datasets don't have as complete slice coverage
               of the brain as do T1-weighted datasets.  If you don't use
               some option (like '-EPI') to suppress scaling in the slice-
               direction, the EPI dataset is likely to stretch the slice
               thickness to better 'match' the T1-weighted brain coverage.
       **N.B.: '-EPI' turns on '-warpfreeze -replacebase'.
               You can use '-nowarpfreeze' and/or '-noreplacebase' AFTER the
               '-EPI' on the command line if you do not want these options used.

  ** OPTIONS to change search ranges for alignment parameters **

 -smallrange   = Set all the parameter ranges to be smaller (about half) than
                 the default ranges, which are rather large for many purposes.
                * Default angle range    is plus/minus 30 degrees
                * Default shift range    is plus/minus 32% of grid size
                * Default scaling range  is plus/minus 20% of grid size
                * Default shearing range is plus/minus 0.1111

 -parfix n v   = Fix parameter #n to be exactly at value 'v'.

 -parang n b t = Allow parameter #n to range only between 'b' and 't'.
                 If not given, default ranges are used.

 -parini n v   = Initialize parameter #n to value 'v', but then
                 allow the algorithm to adjust it.
         **N.B.: Multiple '-par...' options can be used, to constrain
                 multiple parameters.
         **N.B.: -parini has no effect if -twopass is used, since
                 the -twopass algorithm carries out its own search
                 for initial parameters.

 -maxrot dd    = Allow maximum rotation of 'dd' degrees.  Equivalent
                 to '-parang 4 -dd dd -parang 5 -dd dd -parang 6 -dd dd'
                 [Default=30 degrees]

 -maxshf dd    = Allow maximum shift of 'dd' millimeters.  Equivalent
                 to '-parang 1 -dd dd -parang 2 -dd dd -parang 3 -dd dd'
                 [Default=32% of the size of the base image]
         **N.B.: This max shift setting is relative to the center-of-mass
                 shift, if the '-cmass' option is used.

 -maxscl dd    = Allow maximum scaling factor to be 'dd'.  Equivalent
                 to '-parang 7 1/dd dd -parang 8 1/dd dd -paran2 9 1/dd dd'
                 [Default=1.4=image can go up or down 40% in size]

 -maxshr dd    = Allow maximum shearing factor to be 'dd'. Equivalent
                 to '-parang 10 -dd dd -parang 11 -dd dd -parang 12 -dd dd'
                 [Default=0.1111 for no good reason]

 NOTE: If the datasets being registered have only 1 slice, 3dAllineate
       will automatically fix the 6 out-of-plane motion parameters to
       their 'do nothing' values, so you don't have to specify '-parfix'.

 -master mmm = Write the output dataset on the same grid as dataset
               'mmm'.  If this option is NOT given, the base dataset
               is the master.
       **N.B.: 3dAllineate transforms the source dataset to be 'similar'
               to the base image.  Therefore, the coordinate system
               of the master dataset is interpreted as being in the
               reference system of the base image.  It is thus vital
               that these finite 3D volumes overlap, or you will lose data!
       **N.B.: If 'mmm' is the string 'SOURCE', then the source dataset
               is used as the master for the output dataset grid.
               You can also use 'BASE', which is of course the default.

 -mast_dxyz del = Write the output dataset using grid spacings of
  *OR*            'del' mm.  If this option is NOT given, then the
 -newgrid del     grid spacings in the master dataset will be used.
                  This option is useful when registering low resolution
                  data (e.g., EPI time series) to high resolution
                  datasets (e.g., MPRAGE) where you don't want to
                  consume vast amounts of disk space interpolating
                  the low resolution data to some artificially fine
                  (and meaningless) spatial grid.

----------------------------------------------
DEFINITION OF AFFINE TRANSFORMATION PARAMETERS
----------------------------------------------
The 3x3 spatial transformation matrix is calculated as [S][D][U],
where [S] is the shear matrix,
      [D] is the scaling matrix, and
      [U] is the rotation (proper orthogonal) matrix.
Thes matrices are specified in DICOM-ordered (x=-R+L,y=-A+P,z=-I+S)
coordinates as:

  [U] = [Rotate_y(param#6)] [Rotate_x(param#5)] [Rotate_z(param #4)]
        (angles are in degrees)

  [D] = diag( param#7 , param#8 , param#9 )

        [    1        0     0 ]        [ 1 param#10 param#11 ]
  [S] = [ param#10    1     0 ]   OR   [ 0    1     param#12 ]
        [ param#11 param#12 1 ]        [ 0    0        1     ]

The shift vector comprises parameters #1, #2, and #3.

The goal of the program is to find the warp parameters such that
   I([x]_warped) 'is similar to' J([x]_in)
as closely as possible in some sense of 'similar', where J(x) is the
base image, and I(x) is the source image.

Using '-parfix', you can specify that some of these parameters
are fixed.  For example, '-shift_rotate_scale' is equivalent
'-affine_general -parfix 10 0 -parfix 11 0 -parfix 12 0'.
Don't even think of using the '-parfix' option unless you grok
this example!

----------- Special Note for the '-EPI' Option's Coordinates -----------
In this case, the parameters above are with reference to coordinates
  x = frequency encoding direction (by default, first axis of dataset)
  y = phase encoding direction     (by default, second axis of dataset)
  z = slice encoding direction     (by default, third axis of dataset)
This option lets you freeze some of the warping parameters in ways that
make physical sense, considering how echo-planar images are acquired.
The x- and z-scaling parameters are disabled, and shears will only affect
the y-axis.  Thus, there will be only 9 free parameters when '-EPI' is
used.  If desired, you can use a '-parang' option to allow the scaling
fixed parameters to vary (put these after the '-EPI' option):
  -parang 7 0.833 1.20     to allow x-scaling
  -parang 9 0.833 1.20     to allow z-scaling
You could also fix some of the other parameters, if that makes sense
in your situation; for example, to disable out-of-slice rotations:
  -parfix 5 0  -parfix 6 0
and to disable out of slice translation:
  -parfix 3 0
NOTE WELL: If you use '-EPI', then the output warp parameters (e.g., in
           '-1Dparam_save') apply to the (freq,phase,slice) xyz coordinates,
           NOT to the DICOM xyz coordinates, so equivalent transformations
           will be expressed with different sets of parameters entirely
           than if you don't use '-EPI'!  This comment does NOT apply
           to the output of '-1Dmatrix_save', since that matrix is
           defined relative to the RAI (DICOM) spatial coordinates.

*********** CHANGING THE ORDER OF MATRIX APPLICATION ***********
   {{{ There is no good reason to ever use these options! }}}

  -SDU or -SUD }= Set the order of the matrix multiplication
  -DSU or -DUS }= for the affine transformations:
  -USD or -UDS }=   S = triangular shear (params #10-12)
                    D = diagonal scaling matrix (params #7-9)
                    U = rotation matrix (params #4-6)
                  Default order is '-SDU', which means that
                  the U matrix is applied first, then the
                  D matrix, then the S matrix.

  -Supper      }= Set the S matrix to be upper or lower
  -Slower      }= triangular [Default=lower triangular]
                  NOTE: There is no '-Lunch' option.
                        There is no '-Faster' option.

  -ashift OR   }= Apply the shift parameters (#1-3) after OR
  -bshift      }= before the matrix transformation. [Default=after]

            ==================================================
        ===== RWCox - September 2006 - Live Long and Prosper =====
            ==================================================

         ********************************************************
        *** From Webster's Dictionary: Allineate == 'to align' ***
         ********************************************************

===========================================================================
                       FORMERLY SECRET HIDDEN OPTIONS
---------------------------------------------------------------------------
        ** N.B.: Most of these are experimental! [permanent beta] **
===========================================================================

 -num_rtb n  = At the beginning of the fine pass, the best set of results
               from the coarse pass are 'refined' a little by further
               optimization, before the single best one is chosen for
               for the final fine optimization.
              * This option sets the maximum number of cost functional
                evaluations to be used (for each set of parameters)
                in this step.
              * The default is 99; a larger value will take more CPU
                time but may give more robust results.
              * If you want to skip this step entirely, use '-num_rtb 0'.
                then, the best of the coarse pass results is taken
                straight to the final optimization passes.
       **N.B.: If you use '-VERB', you will see that one extra case
               is involved in this initial fine refinement step; that
               case is starting with the identity transformation, which
               helps insure against the chance that the coarse pass
               optimizations ran totally amok.
             * MY OPINION: This option is mostly useless - but not always!
                         * Every step in the multi-step alignment process
                            was added at some point to solve a difficult
                            alignment problem.
                         * Since you usually don't know if YOUR problem
                            is difficult, you should not reduce the default
                            process without good reason.

 -nocast     = By default, parameter vectors that are too close to the
               best one are cast out at the end of the coarse pass
               refinement process. Use this option if you want to keep
               them all for the fine resolution pass.
             * MY OPINION: This option is nearly useless.

 -norefinal  = Do NOT re-start the fine iteration step after it
               has converged.  The default is to re-start it, which
               usually results in a small improvement to the result
               (at the cost of CPU time).  This re-start step is an
               an attempt to avoid a local minimum trap.  It is usually
               not necessary, but sometimes helps.

 -realaxes   = Use the 'real' axes stored in the dataset headers, if they
               conflict with the default axes.  [For Jedi AFNI Masters only!]

 -savehist sss = Save start and final 2D histograms as PGM
                 files, with prefix 'sss' (cost: cr mi nmi hel).
                * if filename contains 'FF', floats is written
                * these are the weighted histograms!
                * -savehist will also save histogram files when
                  the -allcost evaluations takes place
                * this option is mostly useless unless '-histbin' is
                  also used
               * MY OPINION: This option is mostly for debugging.
 -median       = Smooth with median filter instead of Gaussian blur.
                 (Somewhat slower, and not obviously useful.)
               * MY OPINION: This option is nearly useless.

 -powell m a   = Set the Powell NEWUOA dimensional parameters to
                 'm' and 'a' (cf. source code in powell_int.c).
                 The number of points used for approximating the
                 cost functional is m*N+a, where N is the number
                 of parameters being optimized.  The default values
                 are m=2 and a=3.  Larger values will probably slow
                 the program down for no good reason.  The smallest
                 allowed values are 1.
               * MY OPINION: This option is nearly useless.

 -target ttt   = Same as '-source ttt'.  In the earliest versions,
                 what I now call the 'source' dataset was called the
                 'target' dataset:
                    Try to remember the kind of September (2006)
                    When life was slow and oh so mellow
                    Try to remember the kind of September
                    When grass was green and source was target.
 -Xwarp       =} Change the warp/matrix setup so that only the x-, y-, or z-
 -Ywarp       =} axis is stretched & sheared.  Useful for EPI, where 'X',
 -Zwarp       =} 'Y', or 'Z' corresponds to the phase encoding direction.
 -FPS fps      = Generalizes -EPI to arbitrary permutation of directions.

 -histpow pp   = By default, the number of bins in the histogram used
                 for calculating the Hellinger, Mutual Information, and
                 Correlation Ratio statistics is n^(1/3), where n is
                 the number of data points.  You can change that exponent
                 to 'pp' with this option.
 -histbin nn   = Or you can just set the number of bins directly to 'nn'.
 -eqbin   nn   = Use equalized marginal histograms with 'nn' bins.
 -clbin   nn   = Use 'nn' equal-spaced bins except for the bot and top,
                 which will be clipped (thus the 'cl').  If nn is 0, the
                 program will pick the number of bins for you.
                 **N.B.: '-clbin 0' is now the default [25 Jul 2007];
                         if you want the old all-equal-spaced bins, use
                         '-histbin 0'.
                 **N.B.: '-clbin' only works when the datasets are
                         non-negative; any negative voxels in either
                         the input or source volumes will force a switch
                         to all equal-spaced bins.
               * MY OPINION: The above histogram-altering options are useless.

 -wtmrad  mm   = Set autoweight/mask median filter radius to 'mm' voxels.
 -wtgrad  gg   = Set autoweight/mask Gaussian filter radius to 'gg' voxels.
 -nmsetup nn   = Use 'nn' points for the setup matching [default=98756]
 -ignout       = Ignore voxels outside the warped source dataset.

 -blok bbb     = Blok definition for the 'lp?' (Local Pearson) cost
                 functions: 'bbb' is one of
                   'BALL(r)' or 'CUBE(r)' or 'RHDD(r)' or 'TOHD(r)'
                 corresponding to
                   spheres or cubes or rhombic dodecahedra or
                   truncated octahedra
                 where 'r' is the size parameter in mm.
                 [Default is 'TOHD(r)' = truncated octahedron]
                 [with 'radius' r chosen to include about 500]
                 [voxels in the base dataset 3D grid.        ]
               * Changing the 'blok' definition/radius should only be
                 needed in unusual situations, as when you are trying
                 to have fun fun fun.
               * You can change the blok shape but leave the program
                 to set the radius, using (say) 'RHDD(0)'.
               * The old default blok shape/size was 'RHDD(6.54321)',
                 so if you want to maintain backward compatibility,
                 you should use option '-blok "RHDD(6.54321)"'
                 * Only voxels in the weight mask will be used
                   inside a blok.
               * HISTORICAL NOTES:
                 * CUBE, RHDD, and TOHD are space filling polyhedra.
                   That is, they are shapes that fit together without
                   overlaps or gaps to fill up 3D space.
                 * To even approximately fill space, BALLs must overlap,
                   unlike the other blok shapes. Which means that BALL
                   bloks will use some voxels more than once.
                 * Kepler discovered/invented the RHDD (honeybees also did).
                 * The TOHD is the 'most compact' or 'most ball-like'
                   of the known convex space filling polyhedra.
                   [Which is why TOHD is the default blok shape.]

 -PearSave sss = Save the final local Pearson correlations into a dataset
   *OR*          with prefix 'sss'. These are the correlations from
 -SavePear sss   which the lpc and lpa cost functionals are calculated.
                * The values will be between -1 and 1 in each blok.
                   See the 'Too Much Detail' section below for how
                   these correlations are used to compute lpc and lpa.
                * Locations not used in the matching will get 0.
               ** Unless you use '-nmatch 100%', there will be holes
                   of 0s in the bloks, as not all voxels are used in
                   the matching algorithm (speedup attempt).
                * All the matching points in a given blok will get
                   the same value, which makes the resulting dataset
                   look jauntily blocky, especially in color.
                * This saved dataset will be on the grid of the base
                   dataset, and may be zero padded if the program
                   chose to do so in it wisdom. This padding means
                   that the voxels in this output dataset may not
                   match one-to-one with the voxels in the base
                   dataset; however, AFNI displays things using
                   coordinates, so overlaying this dataset on the
                   base dataset (say) should work OK.
                * If you really want this saved dataset to be on the
                   grid as the base dataset, you'll have use
                     3dZeropad -master {Base Dataset} ....
                * Option '-PearSave' works even if you don't use the
                   'lpc' or 'lpa' cost functionals.
                * If you use this option combined with '-allcostX', then
                   the local correlations will be saved from the INITIAL
                   alignment parameters, rather than from the FINAL
                   optimized parameters.
                   (Of course, with '-allcostX', there IS no final result.)
                * This option does NOT work with '-allcost' or '-allcostX1D'.

 -allcost        = Compute ALL available cost functionals and print them
                   at various points in the optimization progress.
 -allcostX       = Compute and print ALL available cost functionals for the
                   un-warped inputs, and then quit.
                  * This option is for testing purposes (AKA 'fun').
 -allcostX1D p q = Compute ALL available cost functionals for the set of
                   parameters given in the 1D file 'p' (12 values per row),
                   write them to the 1D file 'q', then exit. (For you, Zman)
                  * N.B.: If -fineblur is used, that amount of smoothing
                          will be applied prior to the -allcostX evaluations.
                          The parameters are the rotation, shift, scale,
                          and shear values, not the affine transformation
                          matrix. An identity matrix could be provided as
                          "0 0 0  0 0 0  1 1 1  0 0 0" for instance or by
                          using the word "IDENTITY"
                  * This option is for testing purposes (even more 'fun').

===========================================================================

Too Much Detail -- How Local Pearson Correlations Are Computed and Used
-----------------------------------------------------------------------
 * The automask region of the base dataset is divided into a discrete
    set of 'bloks'. Usually there are several thousand bloks.
 * In each blok, the voxel values from the base and the source (after
    the alignment transformation is applied) are extracted and the
    correlation coefficient is computed -- either weighted or unweighted,
    depending on the options used in 3dAllineate (usually weighted).
 * Let p[i] = correlation coefficient in blok #i,
       w[i] = sum of weights used in blok #i, or = 1 if unweighted.
** The values of p[i] are what get output via the '-PearSave' option.
 * Define pc[i] = arctanh(p[i]) = 0.5 * log( (1+p[i]) / (1-p[i]) )
     This expression is designed to 'stretch' out larger correlations,
     giving them more emphasis in psum below. The same reasoning
     is why pc[i]*abs(pc[i]) is used below, to make bigger correlations
     have a bigger impact in the final result.
 * psum = SUM_OVER_i { w[i]*pc[i]*abs(pc[i]) }
   wsum = SUM_OVER_i { w[i] }
   lpc  = psum / wsum   ==> negative correlations are good (smaller lpc)
   lpa  = 1 - abs(lpc)  ==> positive correlations are good (smaller lpa)

===========================================================================

Modifying '-final wsinc5' -- for the truly crazy people out there
-----------------------------------------------------------------
 * The windowed (tapered) sinc function interpolation can be modified
     by several environment variables.  This is expert-level stuff, and
     you should understand what you are doing if you use these options.
     The simplest way to use these would be on the command line, as in
       -DAFNI_WSINC5_RADIUS=9 -DAFNI_WSINC5_TAPERFUN=Hamming

 * AFNI_WSINC5_TAPERFUN lets you choose the taper function.
     The default taper function is the minimum sidelobe 3-term cosine:
       0.4243801 + 0.4973406*cos(PI*x) + 0.0782793*cos(2*PI*x)
     If you set this environment variable to 'Hamming', then the
     minimum sidelobe 2-term cosine will be used instead:
       0.53836 + 0.46164*cos(PI*x)
     Here, 'x' is between 0 and 1, where x=0 is the center of the
     interpolation mask and x=1 is the outer edge.
 ++  Unfortunately, the 3-term cosine doesn't have a catchy name; you can
       find it (and many other) taper functions described in the paper
         AH Nuttall, Some Windows with Very Good Sidelobe Behavior.
         IEEE Trans. ASSP, 29:84-91 (1981).
       In particular, see Fig.14 and Eq.36 in this paper.

 * AFNI_WSINC5_TAPERCUT lets you choose the start 'x' point for tapering:
     This value should be between 0 and 0.8; for example, 0 means to taper
     all the way from x=0 to x=1 (maximum tapering).  The default value
     is 0.  Setting TAPERCUT to 0.5 (say) means only to taper from x=0.5
     to x=1; thus, a larger value means that fewer points are tapered
     inside the interpolation mask.

 * AFNI_WSINC5_RADIUS lets you choose the radius of the tapering window
     (i.e., the interpolation mask region).  This value is an integer
     between 3 and 21.  The default value is 5 (which used to be the
     ONLY value, thus 'wsinc5').  RADIUS is measured in voxels, not mm.

 * AFNI_WSINC5_SPHERICAL lets you choose the shape of the mask region.
     If you set this value to 'Yes', then the interpolation mask will be
     spherical; otherwise, it defaults to cubical.

 * The Hamming taper function is a little faster than the 3-term function,
     but will have a little more Gibbs phenomenon.
 * A larger TAPERCUT will give a little more Gibbs phenomenon; compute
     speed won't change much with this parameter.
 * Compute time goes up with (at least) the 3rd power of the RADIUS; setting
     RADIUS to 21 will be VERY slow.
 * Visually, RADIUS=3 is similar to quintic interpolation.  Increasing
     RADIUS makes the interpolated images look sharper and more well-
     defined.  However, values of RADIUS greater than or equal to 7 appear
     (to Zhark's eagle eye) to be almost identical.  If you really care,
     you'll have to experiment with this parameter yourself.
 * A spherical mask is also VERY slow, since the cubical mask allows
     evaluation as a tensor product.  There is really no good reason
     to use a spherical mask; I only put it in for fun/experimental purposes.
** For most users, there is NO reason to ever use these environment variables
     to modify wsinc5.  You should only do this kind of thing if you have a
     good and articulable reason!  (Or if you really like to screw around.)
** The wsinc5 interpolation function is parallelized using OpenMP, which
     makes its usage moderately tolerable.

===========================================================================

Hidden experimental cost functionals:
-------------------------------------
   sp   *OR*  spearman        = Spearman [rank] Correlation
   je   *OR*  jointentropy    = Joint Entropy [H(b,s)]
   lss  *OR*  signedPcor      = Signed Pearson Correlation

Notes for the new [Feb 2010] lpc+ cost functional:
--------------------------------------------------
 * The cost functional named 'lpc+' is a combination of several others:
     lpc + hel*0.4 + crA*0.4 + nmi*0.2 + mi*0.2 + ov*0.4
   ++ 'hel', 'crA', 'nmi', and 'mi' are the histogram-based cost
      functionals also available as standalone options.
   ++ 'ov' is a measure of the overlap of the automasks of the base and
      source volumes; ov is not available as a standalone option.

 * The purpose of lpc+ is to avoid situations where the pure lpc cost
   goes wild; this especially happens if '-source_automask' isn't used.
   ++ Even with lpc+, you should use '-source_automask+2' (say) to be safe.

 * You can alter the weighting of the extra functionals by giving the
   option in the form (for example)
     '-lpc+hel*0.5+nmi*0+mi*0+crA*1.0+ov*0.5'

 * The quotes are needed to prevent the shell from wild-card expanding
   the '*' character.
   --> You can now use ':' in place of '*' to avoid this wildcard problem:
         -lpc+hel:0.5+nmi:0+mi:0+crA:1+ov:0.5+ZZ

 * Notice the weight factors FOLLOW the name of the extra functionals.
   ++ If you want a weight to be 0 or 1, you have to provide for that
      explicitly -- if you leave a weight off, then it will get its
      default value!
   ++ The order of the weight factor names is unimportant here:
        '-lpc+hel*0.5+nmi*0.8' == '-lpc+nmi*0.8+hel*0.5'

 * Only the 5 functionals listed (hel,crA,nmi,mi,ov) can be used in '-lpc+'.

 * In addition, if you want the initial alignments to be with '-lpc+' and
   then finish the Final alignment with pure '-lpc', you can indicate this
   by putting 'ZZ' somewhere in the option string, as in '-lpc+ZZ'.
 ***** '-cost lpc+ZZ' is very useful for aligning EPI to T1w volumes *****

 * [28 Nov 2018]
   All of the above now applies to the 'lpa+' cost functional,
   which can be used as a robust method for like-to-like alignment.
   For example, aligning 3T and 7T T1-weighted datasets from the same person.
 * [28 Sep 2021]
   However, the default multiplier constants for cost 'lpa+' are now
   different from the 'lpc+' multipliers -- to make 'lpa+' more
   robust. The new default for 'lpa+' is
     lpa + hel*0.4 + crA*0.4 + nmi*0.2 + mi*0.0 + ov*0.4
 ***** '-cost lpa+ZZ' is very useful for T1w to T1w volumes (or any     *****
 ***** similar-contrast datasets).                                      *****

 *** Note that in trial runs, we have found that lpc+ZZ and lpa+ZZ are    ***
 *** more robust than lpc+ and lpa+ -- which is why the '+ZZ' amendment   ***
 *** was created.                                                         ***

Cost functional descriptions (for use with -allcost output):
------------------------------------------------------------
   ls  :: 1 - abs(Pearson correlation coefficient)
   sp  :: 1 - abs(Spearman correlation coefficient)
   mi  :: - Mutual Information = H(base,source)-H(base)-H(source)
   crM :: 1 - abs[ CR(base,source) * CR(source,base) ]
   nmi :: 1/Normalized MI = H(base,source)/[H(base)+H(source)]
   je  :: H(base,source) = joint entropy of image pair
   hel :: - Hellinger distance(base,source)
   crA :: 1 - abs[ CR(base,source) + CR(source,base) ]
   crU :: CR(source,base) = Var(source|base) / Var(source)
   lss :: Pearson correlation coefficient between image pair
   lpc :: nonlinear average of Pearson cc over local neighborhoods
   lpa :: 1 - abs(lpc)
   lpc+:: lpc + hel + mi + nmi + crA + overlap
   lpa+:: lpa + hel +      nmi + crA + overlap

 * N.B.: Some cost functional values (as printed out above)
   are negated from their theoretical descriptions (e.g., 'hel')
   so that the best image alignment will be found when the cost
   is minimized.  See the descriptions above and the references
   below for more details for each functional.

 * MY OPINIONS:
   * Some of these cost functionals were implemented only for
      the purposes of fun and/or comparison and/or experimentation
      and/or special circumstances. These are
        sp je lss crM crA crM hel mi nmi
   * For many purposes, lpc+ZZ and lpa+ZZ are the most robust
      cost functionals, but usually the slowest to evaluate.
   * HOWEVER, just because some method is best MOST of the
      time does not mean it is best ALL of the time.
      Please check your results visually, or at some point
      in time you will have bad results and not know it!
   * For speed and for 'like-to-like' alignment, '-cost ls'
      can work well.

 * For more information about the 'lpc' functional, see
     ZS Saad, DR Glen, G Chen, MS Beauchamp, R Desai, RW Cox.
       A new method for improving functional-to-structural
       MRI alignment using local Pearson correlation.
       NeuroImage 44: 839-848, 2009.
     http://dx.doi.org/10.1016/j.neuroimage.2008.09.037
     https://pubmed.ncbi.nlm.nih.gov/18976717
   The '-blok' option can be used to control the regions
   (size and shape) used to compute the local correlations.
 *** Using the 'lpc' functional wisely requires the use of
     a proper weight volume.  We HIGHLY recommend you use
     the align_epi_anat.py script if you want to use this
     cost functional!  Otherwise, you are likely to get
     less than optimal results (and then swear at us unjustly).

 * For more information about the 'cr' functionals, see
     http://en.wikipedia.org/wiki/Correlation_ratio
   Note that CR(x,y) is not the same as CR(y,x), which
   is why there are symmetrized versions of it available.

 * For more information about the 'mi', 'nmi', and 'je'
   cost functionals, see
     http://en.wikipedia.org/wiki/Mutual_information
     http://en.wikipedia.org/wiki/Joint_entropy
     http://www.cs.jhu.edu/~cis/cista/746/papers/mutual_info_survey.pdf

 * For more information about the 'hel' functional, see
     http://en.wikipedia.org/wiki/Hellinger_distance

 * Some cost functionals (e.g., 'mi', 'cr', 'hel') are
   computed by creating a 2D joint histogram of the
   base and source image pair.  Various options above
   (e.g., '-histbin', etc.) can be used to control the
   number of bins used in the histogram on each axis.
   (If you care to control the program in such detail!)

 * Minimization of the chosen cost functional is done via
   the NEWUOA software, described in detail in
     MJD Powell. 'The NEWUOA software for unconstrained
       optimization without derivatives.' In: GD Pillo,
       M Roma (Eds), Large-Scale Nonlinear Optimization.
       Springer, 2006.
     http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2004_08.pdf

===========================================================================

SUMMARY of the Default Allineation Process
------------------------------------------
As mentioned earlier, each of these steps was added to deal with a problem
 that came up over the years. The resulting process is reasonably robust :),
 but then tends to be slow :(. If you use the '-verb' or '-VERB' option, you
 will get a lot of fun fun fun progress messages that show the results from
 this sequence of steps.

Below, I refer to different scales of effort in the optimizations at each
 step. Easier/faster optimization is done using: matching with fewer points
 from the datasets; more smoothing of the base and source datasets; and by
 putting a smaller upper limit on the number of trials the optimizer is
 allowed to take. The Coarse phase starts with the easiest optimization,
 and increases the difficulty a little at each refinement. The Fine phase
 starts with the most difficult optimization setup: the most points for
 matching, little or no smoothing, and a large limit on the number of
 optimizer trials.

 0. Preliminary Setup [Goal: create the basis for the following steps]
  a. Create the automask and/or autoweight from the '-base' dataset.
     The cost functional will only be computed from voxels inside the
     automask, and only a fraction of those voxels will actually be used
     for evaluating the cost functional (unless '-nmatch 100%' is used).
  b. If the automask is 'too close' to the outside of the base 3D volume,
     zeropad the base dataset to avoid edge effects.
  c. Determine the 3D (x,y,z) shifts for the '-cmass' center-of-mass
     crude alignment, if ordered by the user.
  d. Set ranges of transformation parameters and which parameters are to
     be frozen at fixed values.

 1. Coarse Phase [Goal: explore the vastness of 6-12D parameter space]
  a. The first step uses only the first 6 parameters (shifts + rotations),
     and evaluates thousands of potential starting points -- selected from
     a 6D grid in parameter space and also from random points in 6D
     parameter space. This step is fairly slow. The best 45 parameter
     sets (in the sense of the cost functional) are kept for the next step.
  b. Still using only the first 6 parameters, the best 45 sets of parameters
     undergo a little optimization. The best 6 parameter sets after this
     refinement are kept for the next step. (The number of sets chosen
     to go on to the next step can be set by the '-twobest' option.)
     The optimizations in this step use the blurring radius that is
     given by option '-twoblur', which defaults to 7.77 mm, and use
     relatively few points in each dataset for computing the cost functional.
  c. These 6 best parameter sets undergo further, more costly, optimization,
     now using all 12 parameters. This optimization runs in 3 passes, each
     more costly (less smoothing, more matching points) than the previous.
     (If 2 sets get too close in parameter space, 1 of them will be cast out
     -- this does not happen often.) Output parameter sets from the 3rd pass
     of successive refinement are inputs to the fine refinement phase.

 2. Fine Phase [Goal: use more expensive optimization on good starting points]
  a. The 6 outputs from step 1c have the null parameter set (all 0, except
     for the '-cmass' shifts) appended. Then a small amount of optimization
     is applied to each of these 7 parameter sets ('-num_rtb'). The null
     parameter set is added here to insure against the possibility that the
     coarse optimizations 'ran away' to some unpleasant locations in the 12D
     parameter space. These optimizations use the full set of points specified
     by '-nmatch', and the smoothing specified by '-fineblur' (default = 0),
     but the number of functional evaluations is small, to make this step fast.
  b. The best (smallest cost) set from step 2a is chosen for the final
     optimization, which is run until the '-conv' limit is reached.
     These are the 'Finalish' parameters (shown using '-verb').
  c. The set of parameters from step 2b is used as the starting point
     for a new optimization, in an attempt to avoid a false minimum.
     The results of this optimization are the final parameter set.

 3. The final set of parameters is used to produce the output volume,
    using the '-final' interpolation method.

In practice, the output from the Coarse phase successive refinements is
usually so good that the Fine phase runs quickly and makes only small
adjustments. The quality resulting from the Coarse phase steps is mostly
due, in my opinion, to the large number of initial trials (1ab), followed by
by the successive refinements of several parameter sets (1c) to help usher
'good' candidates to the starting line for the Fine phase.

For some 'easy' registration problems -- such as T1w-to-T1w alignment, high
quality images, a lot of overlap to start with -- the process can be sped
up by reducing the number of steps. For example, '-num_rtb 0 -twobest 0'
would eliminate step 2a and speed up step 1c. Even more extreme, '-onepass'
could be used to skip all of the Coarse phase. But be careful out there!

For 'hard' registration problems, cleverness is usually needed. Choice
of cost functional matters. Preprocessing the datasets may be necessary.
Using '-twobest 29' could help by providing more candidates for the
Fine phase -- at the cost of CPU time. If you run into trouble -- which
happens sooner or later -- try the AFNI Message Board -- and please
give details, including the exact command line(s) you used.

 =========================================================================
* This binary version of 3dAllineate is compiled using OpenMP, a semi-
   automatic parallelizer software toolkit, which splits the work across
   multiple CPUs/cores on the same shared memory computer.
* OpenMP is NOT like MPI -- it does not work with CPUs connected only
   by a network (e.g., OpenMP doesn't work across cluster nodes).
* For some implementation and compilation details, please see
   https://afni.nimh.nih.gov/pub/dist/doc/misc/OpenMP.html
* The number of CPU threads used will default to the maximum number on
   your system. You can control this value by setting environment variable
   OMP_NUM_THREADS to some smaller value (including 1).
* Un-setting OMP_NUM_THREADS resets OpenMP back to its default state of
   using all CPUs available.
   ++ However, on some systems, it seems to be necessary to set variable
      OMP_NUM_THREADS explicitly, or you only get one CPU.
   ++ On other systems with many CPUS, you probably want to limit the CPU
      count, since using more than (say) 16 threads is probably useless.
* You must set OMP_NUM_THREADS in the shell BEFORE running the program,
   since OpenMP queries this variable BEFORE the program actually starts.
   ++ You can't usefully set this variable in your ~/.afnirc file or on the
      command line with the '-D' option.
* How many threads are useful? That varies with the program, and how well
   it was coded. You'll have to experiment on your own systems!
* The number of CPUs on this particular computer system is ...... 2.
* The maximum number of CPUs that will be used is now set to .... 2.
* OpenMP may or may not speed up the program significantly.  Limited
   tests show that it provides some benefit, particularly when using
   the more complicated interpolation methods (e.g., '-cubic' and/or
   '-final wsinc5'), for up to 3-4 CPU threads.
* But the speedup is definitely not linear in the number of threads, alas.
   Probably because my parallelization efforts were pretty limited.
 =========================================================================

++ Compile date = Oct 13 2022 {AFNI_22.3.03:linux_ubuntu_16_64}