:orphan:

.. _ahelp_3dAllineate:

***********
3dAllineate
***********

.. contents:: :local:


| 

.. code-block:: none

    
    Usage: 3dAllineate [options] sourcedataset
    
    --------------------------------------------------------------------------
          Program to align one dataset (the 'source') to a 'base'
          dataset, using an affine (matrix) transformation of space.
    --------------------------------------------------------------------------
    
    --------------------------------------------------------------------------
        ***** Please check your results visually, or at some point  *****
        ***** in time you will have bad results and not know it :-( *****
        *****                                                       *****
        ***** No method for 3D image alignment, however tested it   *****
        ***** was, can be relied upon 100% of the time, and anyone  *****
        ***** who tells you otherwise is a madman or is a liar!!!!  *****
        *****                                                       *****
        ***** In particular, if you are aligning two datasets with  *****
        ***** significantly different spatial coverage (e.g.,       *****
        ***** -source = whole head T1w and -base = MNI template),   *****
        ***** the be careful to check the results. In such a case,  *****
        ***** using '-twobest MAX' should increase the chance of    *****
        ***** getting a good alignment (at the cost of CPU time).   *****
        *****                                                       *****
        ***** Furthermore, don't EVER think that "I have so much    *****
        ***** data that a few errors will not matter"!!!!           *****
    --------------------------------------------------------------------------
    
    * Options (lots of them!) are available to control:
     ++ How the matching between the source and the base is computed
        (i.e., the 'cost functional' measuring image mismatch).
     ++ How the resliced source is interpolated to the base space.
     ++ The complexity of the spatial transformation ('warp') used.
     ++ And many many technical options to control the process in detail,
        if you know what you are doing (or just like to fool around).
    
    * This program is a generalization of and improvement on the older
        software 3dWarpDrive.
    
    * For nonlinear transformations, see program 3dQwarp.
    
    * 3dAllineate can also be used to apply a pre-computed matrix to a dataset
      to produce the transformed output. In this mode of operation, it just
      skips the alignment process, whose function is to compute the matrix,
      and instead it reads the matrix in, computes the output dataset,
      writes it out, and stops.
    
    * If you are curious about the stepwise process used, see the section below
      titled: SUMMARY of the Default Allineation Process.
    
    =====----------------------------------------------------------------------
    NOTES: For most 3D image registration purposes, we now recommend that you
    =====  use Daniel Glen's script align_epi_anat.py (which, despite its name,
           can do many more registration problems than EPI-to-T1-weighted).
      -->> In particular, using 3dAllineate with the 'lpc' cost functional
           (to align EPI and T1-weighted volumes) requires using a '-weight'
           volume to get good results, and the align_epi_anat.py script will
           automagically generate such a weight dataset that works well for
           EPI-to-structural alignment.
      -->> This script can also be used for other alignment purposes, such
           as T1-weighted alignment between field strengths using the
           '-lpa' cost functional.  Investigate align_epi_anat.py to
           see if it will do what you need -- you might make your life
           a little easier and nicer and happier and more tranquil.
      -->> Also, if/when you ask for registration help on the AFNI
           message board, we'll probably start by recommending that you
           try align_epi_anat.py if you haven't already done so.
      -->> For aligning EPI and T1-weighted volumes, we have found that
           using a flip angle of 50-60 degrees for the EPI works better than
           a flip angle of 90 degrees.  The reason is that there is more
           internal contrast in the EPI data when the flip angle is smaller,
           so the registration has some image structure to work with.  With
           the 90 degree flip angle, there is so little internal contrast in
           the EPI dataset that the alignment process ends up being just
           trying to match brain outlines -- which doesn't always give accurate
           results: see http://dx.doi.org/10.1016/j.neuroimage.2008.09.037
      -->> Although the total MRI signal is reduced at a smaller flip angle,
           there is little or no loss in FMRI/BOLD information, since the bulk
           of the time series 'noise' is from physiological fluctuation signals,
           which are also reduced by the lower flip angle -- for more details,
           see http://dx.doi.org/10.1016/j.neuroimage.2010.11.020
    ---------------------------------------------------------------------------
      **** New (Summer 2013) program 3dQwarp is available to do nonlinear  ****
      ***  alignment between a base and source dataset, including the use   ***
      **   of 3dAllineate for the preliminary affine alignment.  If you are  **
      *    interested, see the output of '3dQwarp -help' for the details.     *
    ---------------------------------------------------------------------------
    
    COMMAND LINE OPTIONS:
    ====================
     -base bbb   = Set the base dataset to be the #0 sub-brick of 'bbb'.
                   If no -base option is given, then the base volume is
                   taken to be the #0 sub-brick of the source dataset.
                   (Base must be stored as floats, shorts, or bytes.)
                ** -base is not needed if you are just applying a given
                   transformation to the -source dataset to produce
                   the output, using -1Dmatrix_apply or -1Dparam_apply
                ** Unless you use the -master option, the aligned
                   output dataset will be stored on the same 3D grid
                   as the -base dataset.
    
     -source ttt = Read the source dataset from 'ttt'.  If no -source
       *OR*        (or -input) option is given, then the source dataset
     -input ttt    is the last argument on the command line.
                   (Source must be stored as floats, shorts, or bytes.)
                ** This is the dataset to be transformed, to match the
                   -base dataset, or directly with one of the options
                   -1Dmatrix_apply or -1Dparam_apply
                ** 3dAllineate can register 2D datasets (single slice),
                   but both the base and source must be 2D -- you cannot
                   use this program to register a 2D slice into a 3D volume!
                   -- However, the 'lpc' and 'lpa' cost functionals do not
                      work properly with 2D images, as they are designed
                      around local 3D neighborhoods and that code has not
                      been patched to work with 2D neighborhoods :(
                   -- You can input .jpg files as 2D 'datasets', register
                      them with 3dAllineate, and write the result back out
                      using a prefix that ends in '.jpg'; HOWEVER, the color
                      information will not be used in the registration, as
                      this program was written to deal with monochrome medical
                      datasets. At the end, if the source was RGB (color), then
                      the output will be also be RGB, and then a color .jpg
                      can be output.
                   -- The above remarks also apply to aligning 3D RGB datasets:
                      it will be done using only the 3D volumes converted to
                      grayscale, but the final output will be the source
                      RGB dataset transformed to the (hopefully) aligned grid.
                     * However, I've never tested aligning 3D color datasets;
                       you can be the first one ever!
                ** See the script @2dwarper.Allin for an example of using
                   3dAllineate to do slice-by-slice nonlinear warping to
                   align 3D volumes distorted by time-dependent magnetic
                   field inhomogeneities.
    
     ** NOTA BENE: The base and source dataset do NOT have to be defined **
     ** [that's]   on the same 3D grids; the alignment process uses the  **
     ** [Latin ]   coordinate systems defined in the dataset headers to  **
     ** [  for ]   make the match between spatial locations, rather than **
     ** [ NOTE ]   matching the 2 datasets on a voxel-by-voxel basis     **
     ** [ WELL ]   (as 3dvolreg and 3dWarpDrive do).                     **
     **       -->> However, this coordinate-based matching requires that **
     **            image volumes be defined on roughly the same patch of **
     **            of (x,y,z) space, in order to find a decent starting  **
     **            point for the transformation.  You might need to use  **
     **            the script @Align_Centers to do this, if the 3D       **
     **            spaces occupied by the images do not overlap much.    **
     **       -->> Or the '-cmass' option to this program might be       **
     **            sufficient to solve this problem, maybe, with luck.   **
     **            (Another reason why you should use align_epi_anat.py) **
     **       -->> If the coordinate system in the dataset headers is    **
     **            WRONG, then 3dAllineate will probably not work well!  **
     **            And I say this because we have seen this in several   **
     **            datasets downloaded from online archives.             **
    
     -prefix ppp = Output the resulting dataset to file 'ppp'.  If this
       *OR*        option is NOT given, no dataset will be output!  The
     -out ppp      transformation matrix to align the source to the base will
                   be estimated, but not applied.  You can save the matrix
                   for later use using the '-1Dmatrix_save' option.
            *N.B.: By default, the new dataset is computed on the grid of the
                    base dataset; see the '-master' and/or the '-mast_dxyz'
                    options to change this grid.
            *N.B.: If 'ppp' is 'NULL', then no output dataset will be produced.
                    This option is for compatibility with 3dvolreg.
    
     -floatize   = Write result dataset as floats.  Internal calculations
     -float        are all done on float copies of the input datasets.
                   [Default=convert output dataset to data format of  ]
                   [        source dataset; if the source dataset was ]
                   [        shorts with a scale factor, then the new  ]
                   [        dataset will get a scale factor as well;  ]
                   [        if the source dataset was shorts with no  ]
                   [        scale factor, the result will be unscaled.]
    
     -1Dparam_save ff   = Save the warp parameters in ASCII (.1D) format into
                          file 'ff' (1 row per sub-brick in source).
                        * A historical synonym for this option is '-1Dfile'.
                        * At the top of the saved 1D file is a #comment line
                          listing the names of the parameters; those parameters
                          that are fixed (e.g., via '-parfix') will be marked
                          by having their symbolic names end in the '$' character.
                          You can use '1dcat -nonfixed' to remove these columns
                          from the 1D file if you just want to further process the
                          varying parameters somehow (e.g., 1dsvd).
                        * However, the '-1Dparam_apply' option requires the
                          full list of parameters, including those that were
                          fixed, in order to work properly!
    
     -1Dparam_apply aa  = Read warp parameters from file 'aa', apply them to 
                          the source dataset, and produce a new dataset.
                          (Must also use the '-prefix' option for this to work!  )
                          (In this mode of operation, there is no optimization of)
                          (the cost functional by changing the warp parameters;  )
                          (previously computed parameters are applied directly.  )
                   *N.B.: If you use -1Dparam_apply, you may also want to use
                           -master to control the grid on which the new
                           dataset is written -- the base dataset from the
                           original 3dAllineate run would be a good possibility.
                           Otherwise, the new dataset will be written out on the
                           3D grid coverage of the source dataset, and this
                           might result in clipping off part of the image.
                   *N.B.: Each row in the 'aa' file contains the parameters for
                           transforming one sub-brick in the source dataset.
                           If there are more sub-bricks in the source dataset
                           than there are rows in the 'aa' file, then the last
                           row is used repeatedly.
                   *N.B.: A trick to use 3dAllineate to resample a dataset to
                           a finer grid spacing:
                             3dAllineate -input dataset+orig         \
                                         -master template+orig       \
                                         -prefix newdataset          \
                                         -final wsinc5               \
                                         -1Dparam_apply '1D: 12@0'\'  
                           Here, the identity transformation is specified
                           by giving all 12 affine parameters as 0 (note
                           the extra \' at the end of the '1D: 12@0' input!).
                         ** You can also use the word 'IDENTITY' in place of
                            '1D: 12@0'\' (to indicate the identity transformation).
                  **N.B.: Some expert options for modifying how the wsinc5
                           method works are described far below, if you use
                           '-HELP' instead of '-help'.
                ****N.B.: The interpolation method used to produce a dataset
                           is always given via the '-final' option, NOT via
                           '-interp'.  If you forget this and use '-interp'
                           along with one of the 'apply' options, this program
                           will chastise you (gently) and change '-final'
                           to match what the '-interp' input.
    
     -1Dmatrix_save ff  = Save the transformation matrix for each sub-brick into
                          file 'ff' (1 row per sub-brick in the source dataset).
                          If 'ff' does NOT end in '.1D', then the program will
                          append '.aff12.1D' to 'ff' to make the output filename.
                   *N.B.: This matrix is the coordinate transformation from base
                           to source DICOM coordinates. In other terms:
                              Xin = Xsource = M Xout = M Xbase
                                       or
                              Xout = Xbase = inv(M) Xin = inv(M) Xsource
                           where Xin or Xsource is the 4x1 coordinates of a
                           location in the input volume. Xout is the 
                           coordinate of that same location in the output volume.
                           Xbase is the coordinate of the corresponding location
                           in the base dataset. M is ff augmented by a 4th row of
                           [0 0 0 1], X. is an augmented column vector [x,y,z,1]'
                           To get the inverse matrix inv(M)
                           (source to base), use the cat_matvec program, as in
                             cat_matvec fred.aff12.1D -I
    
     -1Dmatrix_apply aa = Use the matrices in file 'aa' to define the spatial
                          transformations to be applied.  Also see program
                          cat_matvec for ways to manipulate these matrix files.
                   *N.B.: You probably want to use either -base or -master
                          with either *_apply option, so that the coordinate
                          system that the matrix refers to is correctly loaded.
                         ** You can also use the word 'IDENTITY' in place of a
                            filename to indicate the identity transformation --
                            presumably for the purpose of resampling the source
                            dataset to a new grid.
    
      * The -1Dmatrix_* options can be used to save and reuse the transformation  *
      * matrices.  In combination with the program cat_matvec, which can multiply *
      * saved transformation matrices, you can also adjust these matrices to      *
      * other alignments. These matrices can also be combined with nonlinear      *
      * warps (from 3dQwarp) using programs 3dNwarpApply or 3dNwarpCat.           *
    
      * The script 'align_epi_anat.py' uses 3dAllineate and 3dvolreg to align EPI *
      * datasets to T1-weighted anatomical datasets, using saved matrices between *
      * the two programs.  This script is our currently recommended method for    *
      * doing such intra-subject alignments.                                      *
    
     -cost ccc   = Defines the 'cost' function that defines the matching
                   between the source and the base; 'ccc' is one of
                    ls   *OR*  leastsq         = Least Squares [Pearson Correlation]
                    mi   *OR*  mutualinfo      = Mutual Information [H(b)+H(s)-H(b,s)]
                    crM  *OR*  corratio_mul    = Correlation Ratio (Symmetrized*)
                    nmi  *OR*  norm_mutualinfo = Normalized MI [H(b,s)/(H(b)+H(s))]
                    hel  *OR*  hellinger       = Hellinger metric
                    crA  *OR*  corratio_add    = Correlation Ratio (Symmetrized+)
                    crU  *OR*  corratio_uns    = Correlation Ratio (Unsym)
                    lpc  *OR*  localPcorSigned = Local Pearson Correlation Signed
                    lpa  *OR*  localPcorAbs    = Local Pearson Correlation Abs
                    lpc+ *OR*  localPcor+Others= Local Pearson Signed + Others
                    lpa+ *OR*  localPcorAbs+Others= Local Pearson Abs + Others
                   You can also specify the cost functional using an option
                   of the form '-mi' rather than '-cost mi', if you like
                   to keep things terse and cryptic (as I do).
                   [Default == '-hel' (for no good reason, but it sounds nice).]
                   **NB** See more below about lpa and lpc, which are typically
                          what we would recommend as first-choice cost functions
                          now:
                            lpa if you have similar contrast vols to align;
                            lpc if you have *non*similar contrast vols to align!
    
     -interp iii = Defines interpolation method to use during matching
                   process, where 'iii' is one of
                     NN      *OR* nearestneighbour *OR nearestneighbor
                     linear  *OR* trilinear
                     cubic   *OR* tricubic
                     quintic *OR* triquintic
                   Using '-NN' instead of '-interp NN' is allowed (e.g.).
                   Note that using cubic or quintic interpolation during
                   the matching process will slow the program down a lot.
                   Use '-final' to affect the interpolation method used
                   to produce the output dataset, once the final registration
                   parameters are determined.  [Default method == 'linear'.]
                ** N.B.: Linear interpolation is used during the coarse
                         alignment pass; the selection here only affects
                         the interpolation method used during the second
                         (fine) alignment pass.
                ** N.B.: '-interp' does NOT define the final method used
                         to produce the output dataset as warped from the
                         input dataset.  If you want to do that, use '-final'.
    
     -final iii  = Defines the interpolation mode used to create the
                   output dataset.  [Default == 'cubic']
                ** N.B.: If you are applying a transformation to an
                           integer-valued dataset (such as an atlas),
                           then you should use '-final NN' to avoid
                           interpolation of the integer labels.
                ** N.B.: For '-final' ONLY, you can use 'wsinc5' to specify
                           that the final interpolation be done using a
                           weighted sinc interpolation method.  This method
                           is so SLOW that you aren't allowed to use it for
                           the registration itself.
                      ++ wsinc5 interpolation is highly accurate and should
                           reduce the smoothing artifacts from lower
                           order interpolation methods (which are most
                           visible if you interpolate an EPI time series
                           to high resolution and then make an image of
                           the voxel-wise variance).
                      ++ On my Intel-based Mac, it takes about 2.5 s to do
                           wsinc5 interpolation, per 1 million voxels output.
                           For comparison, quintic interpolation takes about
                           0.3 s per 1 million voxels: 8 times faster than wsinc5.
                      ++ The '5' refers to the width of the sinc interpolation
                           weights: plus/minus 5 grid points in each direction;
                           this is a tensor product interpolation, for speed.
    
    TECHNICAL OPTIONS (used for fine control of the program):
    =================
     -nmatch nnn = Use at most 'nnn' scattered points to match the
                   datasets.  The smaller nnn is, the faster the matching
                   algorithm will run; however, accuracy may be bad if
                   nnn is too small.  If you end the 'nnn' value with the
                   '%' character, then that percentage of the base's
                   voxels will be used.
                   [Default == 47% of voxels in the weight mask]
    
     -nopad      = Do not use zero-padding on the base image.
                   (I cannot think of a good reason to use this option.)
                   [Default == zero-pad, if needed; -verb shows how much]
    
     -zclip      = Replace negative values in the input datasets (source & base)
     -noneg        with zero.  The intent is to clip off a small set of negative
                   values that may arise when using 3dresample (say) with
                   cubic interpolation.
    
     -conv mmm   = Convergence test is set to 'mmm' millimeters.
                   This doesn't mean that the results will be accurate
                   to 'mmm' millimeters!  It just means that the program
                   stops trying to improve the alignment when the optimizer
                   (NEWUOA) reports it has narrowed the search radius
                   down to this level.
                   * To set this value to the smallest allowable, use '-conv 0'.
                   * A coarser value for 'quick-and-dirty' alignment is 0.05.
    
     -verb       = Print out verbose progress reports.
                   [Using '-VERB' will give even more prolix reports :]
     -quiet      = Don't print out verbose stuff. (But WHY?)
    
     -usetemp    = Write intermediate stuff to disk, to economize on RAM.
                   Using this will slow the program down, but may make it
                   possible to register datasets that need lots of space.
           **N.B.: Temporary files are written to the directory given
                   in environment variable TMPDIR, or in /tmp, or in ./
                   (preference in that order).  If the program crashes,
                   these files are named TIM_somethingrandom, and you
                   may have to delete them manually. (TIM=Temporary IMage)
           **N.B.: If the program fails with a 'malloc failure' type of
                   message, then try '-usetemp' (malloc=memory allocator).
                 * If the program just stops with a message 'killed', that
                   means the operating system (Unix/Linux) stopped the
                   program, which almost always is due to the system running
                   low on memory -- so it starts killing programs to save itself.
    
     -nousetemp  = Don't use temporary workspace on disk [the default].
    
     -check hhh  = After cost functional optimization is done, start at the
                   final parameters and RE-optimize using the new cost
                   function 'hhh'.  If the results are too different, a
                   warning message will be printed.  However, the final
                   parameters from the original optimization will be
                   used to create the output dataset. Using '-check'
                   increases the CPU time, but can help you feel sure
                   that the alignment process did not go wild and crazy.
                   [Default == no check == don't worry, be happy!]
           **N.B.: You can put more than one function after '-check', as in
                     -nmi -check mi hel crU crM
                   to register with Normalized Mutual Information, and
                   then check the results against 4 other cost functionals.
           **N.B.: On the other hand, some cost functionals give better
                   results than others for specific problems, and so
                   a warning that 'mi' was significantly different than
                   'hel' might not actually mean anything useful (e.g.).
    
     ** PARAMETERS THAT AFFECT THE COST OPTIMIZATION STRATEGY **
    
     -onepass    = Use only the refining pass -- do not try a coarse
                   resolution pass first.  Useful if you know that only
                   SMALL amounts of image alignment are needed.
                   [The default is to use both passes.]
    
     -twopass    = Use a two pass alignment strategy, first searching for
                   a large rotation+shift and then refining the alignment.
                   [Two passes are used by default for the first sub-brick]
                   [in the source dataset, and then one pass for the others.]
                   ['-twopass' will do two passes for ALL source sub-bricks.]
                *** The first (coarse) pass is relatively slow, as it tries
                     to search a large volume of parameter (rotations+shifts)
                     space for initial guesses at the alignment transformation.
                  * A lot of these initial guesses are kept and checked to
                     see which ones lead to good starting points for the
                     further refinement.
                  * The winners of this competition are then passed to the
                     '-twobest' (infra) successive optimization passes.
                  * The ultimate winner of THAT stage is what starts
                     the second (fine) pass alignment. Usually, this starting
                     point is so good that the fine pass optimization does
                     not provide a lot of improvement; that is, most of the
                     run time ends up in coarse pass with its multiple stages.
                  * All of these stages are intended to help the program avoid
                     stopping at a 'false' minimum in the cost functional.
                     They were added to the software as we gathered experience
                     with difficult 3D alignment problems. The combination of
                     multiple stages of partial optimization of multiple
                     parameter candidates makes the coarse pass slow, but also
                     makes it (usually) work well.
    
     -twoblur rr = Set the blurring radius for the first pass to 'rr'
                   millimeters.  [Default == 11 mm]
           **N.B.: You may want to change this from the default if
                   your voxels are unusually small or unusually large
                   (e.g., outside the range 1-4 mm along each axis).
    
     -twofirst   = Use -twopass on the first image to be registered, and
                   then on all subsequent images from the source dataset,
                   use results from the first image's coarse pass to start
                   the fine pass.
                   (Useful when there may be large motions between the   )
                   (source and the base, but only small motions within   )
                   (the source dataset itself; since the coarse pass can )
                   (be slow, doing it only once makes sense in this case.)
           **N.B.: [-twofirst is on by default; '-twopass' turns it off.]
    
     -twobest bb = In the coarse pass, use the best 'bb' set of initial
                   points to search for the starting point for the fine
                   pass.  If bb==0, then no search is made for the best
                   starting point, and the identity transformation is
                   used as the starting point.  [Default=5; min=0 max=29]
           **N.B.: Setting bb=0 will make things run faster, but less reliably.
                   Setting bb = 'MAX' will make it be the max allowed value.
    
     -fineblur x = Set the blurring radius to use in the fine resolution
                   pass to 'x' mm.  A small amount (1-2 mm?) of blurring at
                   the fine step may help with convergence, if there is
                   some problem, especially if the base volume is very noisy.
                   [Default == 0 mm = no blurring at the final alignment pass]
    
       **NOTES ON
       **STRATEGY: * If you expect only small-ish (< 2 voxels?) image movement,
                     then using '-onepass' or '-twobest 0' makes sense.
                   * If you expect large-ish image movements, then do not
                     use '-onepass' or '-twobest 0'; the purpose of the
                     '-twobest' parameter is to search for large initial
                     rotations/shifts with which to start the coarse
                     optimization round.
                   * If you have multiple sub-bricks in the source dataset,
                     then the default '-twofirst' makes sense if you don't expect
                     large movements WITHIN the source, but expect large motions
                     between the source and base.
                   * '-twopass' re-starts the alignment process for each sub-brick
                     in the source dataset -- this option can be time consuming,
                     and is really intended to be used when you might expect large
                     movements between sub-bricks; for example, when the different
                     volumes are gathered on different days.  For most purposes,
                     '-twofirst' (the default process) will be adequate and faster,
                     when operating on multi-volume source datasets.
    
     -cmass        = Use the center-of-mass calculation to determine an initial shift
                       [This option is OFF by default]
                     can be given as cmass+a, cmass+xy, cmass+yz, cmass+xz
                     where +a means to try determine automatically in which
                     direction the data is partial by looking for a too large shift
                     If given in the form '-cmass+xy' (for example), means to
                     do the CoM calculation in the x- and y-directions, but
                     not the z-direction.
                   * MY OPINION: This option is REALLY useful in most cases.
                                 However, if you only have partial coverage in
                                 the -source dataset, you will need to use
                                 one of the '+' additions to restrict the
                                 use of the CoM limits.
    
     -nocmass      = Don't use the center-of-mass calculation. [The default]
                      (You would not want to use the C-o-M calculation if the  )
                      (source sub-bricks have very different spatial locations,)
                      (since the source C-o-M is calculated from all sub-bricks)
    
     **EXAMPLE: You have a limited coverage set of axial EPI slices you want to
                register into a larger head volume (after 3dSkullStrip, of course).
                In this case, '-cmass+xy' makes sense, allowing CoM adjustment
                along the x = R-L and y = A-P directions, but not along the
                z = I-S direction, since the EPI doesn't cover the whole brain
                along that axis.
    
     -autoweight = Compute a weight function using the 3dAutomask
                   algorithm plus some blurring of the base image.
           **N.B.: '-autoweight+100' means to zero out all voxels
                     with values below 100 before computing the weight.
                   '-autoweight**1.5' means to compute the autoweight
                     and then raise it to the 1.5-th power (e.g., to
                     increase the weight of high-intensity regions).
                   These two processing steps can be combined, as in
                     '-autoweight+100**1.5'
                   ** Note that '**' must be enclosed in quotes;
                      otherwise, the shell will treat it as a wildcard
                      and you will get an error message before 3dAllineate
                      even starts!!
                   ** UPDATE: one can now use '^' for power notation, to 
                      avoid needing to enclose the string in quotes.
           **N.B.: Some cost functionals do not allow -autoweight, and
                   will use -automask instead.  A warning message
                   will be printed if you run into this situation.
                   If a clip level '+xxx' is appended to '-autoweight',
                   then the conversion into '-automask' will NOT happen.
                   Thus, using a small positive '+xxx' can be used trick
                   -autoweight into working on any cost functional.
    
     -automask   = Compute a mask function, which is like -autoweight,
                   but the weight for a voxel is set to either 0 or 1.
           **N.B.: '-automask+3' means to compute the mask function, and
                   then dilate it outwards by 3 voxels (e.g.).
                   ** Note that '+' means something very different
                      for '-automask' and '-autoweight'!!
    
     -autobox    = Expand the -automask function to enclose a rectangular
                   box that holds the irregular mask.
           **N.B.: This is the default mode of operation!
                   For intra-modality registration, '-autoweight' may be better!
                 * If the cost functional is 'ls', then '-autoweight' will be
                   the default, instead of '-autobox'.
    
     -nomask     = Don't compute the autoweight/mask; if -weight is not
                   also used, then every voxel will be counted equally.
    
     -weight www = Set the weighting for each voxel in the base dataset;
                   larger weights mean that voxel counts more in the cost
                   function.
           **N.B.: The weight dataset must be defined on the same grid as
                   the base dataset.
           **N.B.: Even if a method does not allow -autoweight, you CAN
                   use a weight dataset that is not 0/1 valued.  The
                   risk is yours, of course (!*! as always in AFNI !*!).
    
     -wtprefix p = Write the weight volume to disk as a dataset with
                   prefix name 'p'.  Used with '-autoweight/mask', this option
                   lets you see what voxels were important in the algorithm.
    
     -emask ee   = This option lets you specify a mask of voxels to EXCLUDE from
                   the analysis. The voxels where the dataset 'ee' is nonzero
                   will not be included (i.e., their weights will be set to zero).
                 * Like all the weight options, it applies in the base image
                   coordinate system.
                ** Like all the weight options, it means nothing if you are using
                   one of the 'apply' options.
    
        Method  Allows -autoweight
        ------  ------------------
         ls     YES
         mi     NO
         crM    YES
         nmi    NO
         hel    NO
         crA    YES
         crU    YES
         lpc    YES
         lpa    YES
         lpc+   YES
         lpa+   YES
    
     -source_mask sss = Mask the source (input) dataset, using 'sss'.
     -source_automask = Automatically mask the source dataset.
                          [By default, all voxels in the source]
                          [dataset are used in the matching.   ]
                **N.B.: You can also use '-source_automask+3' to dilate
                        the default source automask outward by 3 voxels.
    
     -warp xxx   = Set the warp type to 'xxx', which is one of
                     shift_only         *OR* sho =  3 parameters
                     shift_rotate       *OR* shr =  6 parameters
                     shift_rotate_scale *OR* srs =  9 parameters
                     affine_general     *OR* aff = 12 parameters
                   [Default = affine_general, which includes image]
                   [      shifts, rotations, scaling, and shearing]
                 * MY OPINION: Shearing is usually unimportant, so
                                you can omit it if you want: '-warp srs'.
                               But it doesn't hurt to keep shearing,
                                except for a little extra CPU time.
                               On the other hand, scaling is often
                                important, so should not be omitted.
    
     -warpfreeze = Freeze the non-rigid body parameters (those past #6)
                   after doing the first sub-brick.  Subsequent volumes
                   will have the same spatial distortions as sub-brick #0,
                   plus rigid body motions only.
                 * MY OPINION: This option is almost useless.
    
     -replacebase   = If the source has more than one sub-brick, and this
                      option is turned on, then after the #0 sub-brick is
                      aligned to the base, the aligned #0 sub-brick is used
                      as the base image for subsequent source sub-bricks.
                    * MY OPINION: This option is almost useless.
    
     -replacemeth m = After sub-brick #0 is aligned, switch to method 'm'
                      for later sub-bricks.  For use with '-replacebase'.
                    * MY OPINION: This option is almost useless.
    
     -EPI        = Treat the source dataset as being composed of warped
                   EPI slices, and the base as comprising anatomically
                   'true' images.  Only phase-encoding direction image
                   shearing and scaling will be allowed with this option.
           **N.B.: For most people, the base dataset will be a 3dSkullStrip-ed
                   T1-weighted anatomy (MPRAGE or SPGR).  If you don't remove
                   the skull first, the EPI images (which have little skull
                   visible due to fat-suppression) might expand to fit EPI
                   brain over T1-weighted skull.
           **N.B.: Usually, EPI datasets don't have as complete slice coverage
                   of the brain as do T1-weighted datasets.  If you don't use
                   some option (like '-EPI') to suppress scaling in the slice-
                   direction, the EPI dataset is likely to stretch the slice
                   thickness to better 'match' the T1-weighted brain coverage.
           **N.B.: '-EPI' turns on '-warpfreeze -replacebase'.
                   You can use '-nowarpfreeze' and/or '-noreplacebase' AFTER the
                   '-EPI' on the command line if you do not want these options used.
    
      ** OPTIONS to change search ranges for alignment parameters **
    
     -smallrange   = Set all the parameter ranges to be smaller (about half) than
                     the default ranges, which are rather large for many purposes.
                    * Default angle range    is plus/minus 30 degrees
                    * Default shift range    is plus/minus 32% of grid size
                    * Default scaling range  is plus/minus 20% of grid size
                    * Default shearing range is plus/minus 0.1111
    
     -parfix n v   = Fix parameter #n to be exactly at value 'v'.
    
     -parang n b t = Allow parameter #n to range only between 'b' and 't'.
                     If not given, default ranges are used.
    
     -parini n v   = Initialize parameter #n to value 'v', but then
                     allow the algorithm to adjust it.
             **N.B.: Multiple '-par...' options can be used, to constrain
                     multiple parameters.
             **N.B.: -parini has no effect if -twopass is used, since
                     the -twopass algorithm carries out its own search
                     for initial parameters.
    
     -maxrot dd    = Allow maximum rotation of 'dd' degrees.  Equivalent
                     to '-parang 4 -dd dd -parang 5 -dd dd -parang 6 -dd dd'
                     [Default=30 degrees]
    
     -maxshf dd    = Allow maximum shift of 'dd' millimeters.  Equivalent
                     to '-parang 1 -dd dd -parang 2 -dd dd -parang 3 -dd dd'
                     [Default=32% of the size of the base image]
             **N.B.: This max shift setting is relative to the center-of-mass
                     shift, if the '-cmass' option is used.
    
     -maxscl dd    = Allow maximum scaling factor to be 'dd'.  Equivalent
                     to '-parang 7 1/dd dd -parang 8 1/dd dd -paran2 9 1/dd dd'
                     [Default=1.4=image can go up or down 40% in size]
    
     -maxshr dd    = Allow maximum shearing factor to be 'dd'. Equivalent
                     to '-parang 10 -dd dd -parang 11 -dd dd -parang 12 -dd dd'
                     [Default=0.1111 for no good reason]
    
     NOTE: If the datasets being registered have only 1 slice, 3dAllineate
           will automatically fix the 6 out-of-plane motion parameters to
           their 'do nothing' values, so you don't have to specify '-parfix'.
    
     -master mmm = Write the output dataset on the same grid as dataset
                   'mmm'.  If this option is NOT given, the base dataset
                   is the master.
           **N.B.: 3dAllineate transforms the source dataset to be 'similar'
                   to the base image.  Therefore, the coordinate system
                   of the master dataset is interpreted as being in the
                   reference system of the base image.  It is thus vital
                   that these finite 3D volumes overlap, or you will lose data!
           **N.B.: If 'mmm' is the string 'SOURCE', then the source dataset
                   is used as the master for the output dataset grid.
                   You can also use 'BASE', which is of course the default.
    
     -mast_dxyz del = Write the output dataset using grid spacings of
      *OR*            'del' mm.  If this option is NOT given, then the
     -newgrid del     grid spacings in the master dataset will be used.
                      This option is useful when registering low resolution
                      data (e.g., EPI time series) to high resolution
                      datasets (e.g., MPRAGE) where you don't want to
                      consume vast amounts of disk space interpolating
                      the low resolution data to some artificially fine
                      (and meaningless) spatial grid.
    
    ----------------------------------------------
    DEFINITION OF AFFINE TRANSFORMATION PARAMETERS
    ----------------------------------------------
    The 3x3 spatial transformation matrix is calculated as [S][D][U],
    where [S] is the shear matrix,
          [D] is the scaling matrix, and
          [U] is the rotation (proper orthogonal) matrix.
    Thes matrices are specified in DICOM-ordered (x=-R+L,y=-A+P,z=-I+S)
    coordinates as:
    
      [U] = [Rotate_y(param#6)] [Rotate_x(param#5)] [Rotate_z(param #4)]
            (angles are in degrees)
    
      [D] = diag( param#7 , param#8 , param#9 )
    
            [    1        0     0 ]        [ 1 param#10 param#11 ]
      [S] = [ param#10    1     0 ]   OR   [ 0    1     param#12 ]
            [ param#11 param#12 1 ]        [ 0    0        1     ]
    
    The shift vector comprises parameters #1, #2, and #3.
    
    The goal of the program is to find the warp parameters such that
       I([x]_warped) 'is similar to' J([x]_in)
    as closely as possible in some sense of 'similar', where J(x) is the
    base image, and I(x) is the source image.
    
    Using '-parfix', you can specify that some of these parameters
    are fixed.  For example, '-shift_rotate_scale' is equivalent
    '-affine_general -parfix 10 0 -parfix 11 0 -parfix 12 0'.
    Don't even think of using the '-parfix' option unless you grok
    this example!
    
    ----------- Special Note for the '-EPI' Option's Coordinates -----------
    In this case, the parameters above are with reference to coordinates
      x = frequency encoding direction (by default, first axis of dataset)
      y = phase encoding direction     (by default, second axis of dataset)
      z = slice encoding direction     (by default, third axis of dataset)
    This option lets you freeze some of the warping parameters in ways that
    make physical sense, considering how echo-planar images are acquired.
    The x- and z-scaling parameters are disabled, and shears will only affect
    the y-axis.  Thus, there will be only 9 free parameters when '-EPI' is
    used.  If desired, you can use a '-parang' option to allow the scaling
    fixed parameters to vary (put these after the '-EPI' option):
      -parang 7 0.833 1.20     to allow x-scaling
      -parang 9 0.833 1.20     to allow z-scaling
    You could also fix some of the other parameters, if that makes sense
    in your situation; for example, to disable out-of-slice rotations:
      -parfix 5 0  -parfix 6 0
    and to disable out of slice translation:
      -parfix 3 0
    NOTE WELL: If you use '-EPI', then the output warp parameters (e.g., in
               '-1Dparam_save') apply to the (freq,phase,slice) xyz coordinates,
               NOT to the DICOM xyz coordinates, so equivalent transformations
               will be expressed with different sets of parameters entirely
               than if you don't use '-EPI'!  This comment does NOT apply
               to the output of '-1Dmatrix_save', since that matrix is
               defined relative to the RAI (DICOM) spatial coordinates.
    
    *********** CHANGING THE ORDER OF MATRIX APPLICATION ***********
       {{{ There is no good reason to ever use these options! }}}
    
      -SDU or -SUD }= Set the order of the matrix multiplication
      -DSU or -DUS }= for the affine transformations:
      -USD or -UDS }=   S = triangular shear (params #10-12)
                        D = diagonal scaling matrix (params #7-9)
                        U = rotation matrix (params #4-6)
                      Default order is '-SDU', which means that
                      the U matrix is applied first, then the
                      D matrix, then the S matrix.
    
      -Supper      }= Set the S matrix to be upper or lower
      -Slower      }= triangular [Default=lower triangular]
                      NOTE: There is no '-Lunch' option.
                            There is no '-Faster' option.
    
      -ashift OR   }= Apply the shift parameters (#1-3) after OR
      -bshift      }= before the matrix transformation. [Default=after]
    
                ==================================================
            ===== RWCox - September 2006 - Live Long and Prosper =====
                ==================================================
    
             ********************************************************
            *** From Webster's Dictionary: Allineate == 'to align' ***
             ********************************************************
    
    ===========================================================================
                           FORMERLY SECRET HIDDEN OPTIONS
    ---------------------------------------------------------------------------
            ** N.B.: Most of these are experimental! [permanent beta] **
    ===========================================================================
    
     -num_rtb n  = At the beginning of the fine pass, the best set of results
                   from the coarse pass are 'refined' a little by further
                   optimization, before the single best one is chosen for
                   for the final fine optimization.
                  * This option sets the maximum number of cost functional
                    evaluations to be used (for each set of parameters)
                    in this step.
                  * The default is 99; a larger value will take more CPU
                    time but may give more robust results.
                  * If you want to skip this step entirely, use '-num_rtb 0'.
                    then, the best of the coarse pass results is taken
                    straight to the final optimization passes.
           **N.B.: If you use '-VERB', you will see that one extra case
                   is involved in this initial fine refinement step; that
                   case is starting with the identity transformation, which
                   helps insure against the chance that the coarse pass
                   optimizations ran totally amok.
                 * MY OPINION: This option is mostly useless - but not always!
                             * Every step in the multi-step alignment process
                                was added at some point to solve a difficult
                                alignment problem.
                             * Since you usually don't know if YOUR problem
                                is difficult, you should not reduce the default
                                process without good reason.
    
     -nocast     = By default, parameter vectors that are too close to the
                   best one are cast out at the end of the coarse pass
                   refinement process. Use this option if you want to keep
                   them all for the fine resolution pass.
                 * MY OPINION: This option is nearly useless.
    
     -norefinal  = Do NOT re-start the fine iteration step after it
                   has converged.  The default is to re-start it, which
                   usually results in a small improvement to the result
                   (at the cost of CPU time).  This re-start step is an
                   an attempt to avoid a local minimum trap.  It is usually
                   not necessary, but sometimes helps.
    
     -realaxes   = Use the 'real' axes stored in the dataset headers, if they
                   conflict with the default axes.  [For Jedi AFNI Masters only!]
    
     -savehist sss = Save start and final 2D histograms as PGM
                     files, with prefix 'sss' (cost: cr mi nmi hel).
                    * if filename contains 'FF', floats is written
                    * these are the weighted histograms!
                    * -savehist will also save histogram files when
                      the -allcost evaluations takes place
                    * this option is mostly useless unless '-histbin' is
                      also used
                   * MY OPINION: This option is mostly for debugging.
     -median       = Smooth with median filter instead of Gaussian blur.
                     (Somewhat slower, and not obviously useful.)
                   * MY OPINION: This option is nearly useless.
    
     -powell m a   = Set the Powell NEWUOA dimensional parameters to
                     'm' and 'a' (cf. source code in powell_int.c).
                     The number of points used for approximating the
                     cost functional is m*N+a, where N is the number
                     of parameters being optimized.  The default values
                     are m=2 and a=3.  Larger values will probably slow
                     the program down for no good reason.  The smallest
                     allowed values are 1.
                   * MY OPINION: This option is nearly useless.
    
     -target ttt   = Same as '-source ttt'.  In the earliest versions,
                     what I now call the 'source' dataset was called the
                     'target' dataset:
                        Try to remember the kind of September (2006)
                        When life was slow and oh so mellow
                        Try to remember the kind of September
                        When grass was green and source was target.
     -Xwarp       =} Change the warp/matrix setup so that only the x-, y-, or z-
     -Ywarp       =} axis is stretched & sheared.  Useful for EPI, where 'X',
     -Zwarp       =} 'Y', or 'Z' corresponds to the phase encoding direction.
     -FPS fps      = Generalizes -EPI to arbitrary permutation of directions.
    
     -histpow pp   = By default, the number of bins in the histogram used
                     for calculating the Hellinger, Mutual Information, and
                     Correlation Ratio statistics is n^(1/3), where n is
                     the number of data points.  You can change that exponent
                     to 'pp' with this option.
     -histbin nn   = Or you can just set the number of bins directly to 'nn'.
     -eqbin   nn   = Use equalized marginal histograms with 'nn' bins.
     -clbin   nn   = Use 'nn' equal-spaced bins except for the bot and top,
                     which will be clipped (thus the 'cl').  If nn is 0, the
                     program will pick the number of bins for you.
                     **N.B.: '-clbin 0' is now the default [25 Jul 2007];
                             if you want the old all-equal-spaced bins, use
                             '-histbin 0'.
                     **N.B.: '-clbin' only works when the datasets are
                             non-negative; any negative voxels in either
                             the input or source volumes will force a switch
                             to all equal-spaced bins.
                   * MY OPINION: The above histogram-altering options are useless.
    
     -wtmrad  mm   = Set autoweight/mask median filter radius to 'mm' voxels.
     -wtgrad  gg   = Set autoweight/mask Gaussian filter radius to 'gg' voxels.
     -nmsetup nn   = Use 'nn' points for the setup matching [default=98756]
     -ignout       = Ignore voxels outside the warped source dataset.
    
     -blok bbb     = Blok definition for the 'lp?' (Local Pearson) cost
                     functions: 'bbb' is one of
                       'BALL(r)' or 'CUBE(r)' or 'RHDD(r)' or 'TOHD(r)'
                     corresponding to
                       spheres or cubes or rhombic dodecahedra or
                       truncated octahedra
                     where 'r' is the size parameter in mm.
                     [Default is 'TOHD(r)' = truncated octahedron]
                     [with 'radius' r chosen to include about 500]
                     [voxels in the base dataset 3D grid.        ]
                   * Changing the 'blok' definition/radius should only be
                     needed in unusual situations, as when you are trying
                     to have fun fun fun.
                   * You can change the blok shape but leave the program
                     to set the radius, using (say) 'RHDD(0)'.
                   * The old default blok shape/size was 'RHDD(6.54321)',
                     so if you want to maintain backward compatibility,
                     you should use option '-blok "RHDD(6.54321)"'
                     * Only voxels in the weight mask will be used
                       inside a blok.
                   * HISTORICAL NOTES:
                     * CUBE, RHDD, and TOHD are space filling polyhedra.
                       That is, they are shapes that fit together without
                       overlaps or gaps to fill up 3D space.
                     * To even approximately fill space, BALLs must overlap,
                       unlike the other blok shapes. Which means that BALL
                       bloks will use some voxels more than once.
                     * Kepler discovered/invented the RHDD (honeybees also did).
                     * The TOHD is the 'most compact' or 'most ball-like'
                       of the known convex space filling polyhedra.
                       [Which is why TOHD is the default blok shape.]
    
     -PearSave sss = Save the final local Pearson correlations into a dataset
       *OR*          with prefix 'sss'. These are the correlations from
     -SavePear sss   which the lpc and lpa cost functionals are calculated.
                    * The values will be between -1 and 1 in each blok.
                       See the 'Too Much Detail' section below for how
                       these correlations are used to compute lpc and lpa.
                    * Locations not used in the matching will get 0.
                   ** Unless you use '-nmatch 100%', there will be holes
                       of 0s in the bloks, as not all voxels are used in
                       the matching algorithm (speedup attempt).
                    * All the matching points in a given blok will get
                       the same value, which makes the resulting dataset
                       look jauntily blocky, especially in color.
                    * This saved dataset will be on the grid of the base
                       dataset, and may be zero padded if the program
                       chose to do so in it wisdom. This padding means
                       that the voxels in this output dataset may not
                       match one-to-one with the voxels in the base
                       dataset; however, AFNI displays things using
                       coordinates, so overlaying this dataset on the
                       base dataset (say) should work OK.
                    * If you really want this saved dataset to be on the
                       grid as the base dataset, you'll have use
                         3dZeropad -master {Base Dataset} ....
                    * Option '-PearSave' works even if you don't use the
                       'lpc' or 'lpa' cost functionals.
                    * If you use this option combined with '-allcostX', then
                       the local correlations will be saved from the INITIAL
                       alignment parameters, rather than from the FINAL
                       optimized parameters.
                       (Of course, with '-allcostX', there IS no final result.)
                    * This option does NOT work with '-allcost' or '-allcostX1D'.
    
     -allcost        = Compute ALL available cost functionals and print them
                       at various points in the optimization progress.
     -allcostX       = Compute and print ALL available cost functionals for the
                       un-warped inputs, and then quit.
                      * This option is for testing purposes (AKA 'fun').
     -allcostX1D p q = Compute ALL available cost functionals for the set of
                       parameters given in the 1D file 'p' (12 values per row),
                       write them to the 1D file 'q', then exit. (For you, Zman)
                      * N.B.: If -fineblur is used, that amount of smoothing
                              will be applied prior to the -allcostX evaluations.
                              The parameters are the rotation, shift, scale,
                              and shear values, not the affine transformation
                              matrix. An identity matrix could be provided as
                              "0 0 0  0 0 0  1 1 1  0 0 0" for instance or by
                              using the word "IDENTITY"
                      * This option is for testing purposes (even more 'fun').
    
    ===========================================================================
    
    Too Much Detail -- How Local Pearson Correlations Are Computed and Used
    -----------------------------------------------------------------------
     * The automask region of the base dataset is divided into a discrete
        set of 'bloks'. Usually there are several thousand bloks.
     * In each blok, the voxel values from the base and the source (after
        the alignment transformation is applied) are extracted and the
        correlation coefficient is computed -- either weighted or unweighted,
        depending on the options used in 3dAllineate (usually weighted).
     * Let p[i] = correlation coefficient in blok #i,
           w[i] = sum of weights used in blok #i, or = 1 if unweighted.
    ** The values of p[i] are what get output via the '-PearSave' option.
     * Define pc[i] = arctanh(p[i]) = 0.5 * log( (1+p[i]) / (1-p[i]) )
         This expression is designed to 'stretch' out larger correlations,
         giving them more emphasis in psum below. The same reasoning
         is why pc[i]*abs(pc[i]) is used below, to make bigger correlations
         have a bigger impact in the final result.
     * psum = SUM_OVER_i { w[i]*pc[i]*abs(pc[i]) }
       wsum = SUM_OVER_i { w[i] }
       lpc  = psum / wsum   ==> negative correlations are good (smaller lpc)
       lpa  = 1 - abs(lpc)  ==> positive correlations are good (smaller lpa)
    
    ===========================================================================
    
    Modifying '-final wsinc5' -- for the truly crazy people out there
    -----------------------------------------------------------------
     * The windowed (tapered) sinc function interpolation can be modified
         by several environment variables.  This is expert-level stuff, and
         you should understand what you are doing if you use these options.
         The simplest way to use these would be on the command line, as in
           -DAFNI_WSINC5_RADIUS=9 -DAFNI_WSINC5_TAPERFUN=Hamming
    
     * AFNI_WSINC5_TAPERFUN lets you choose the taper function.
         The default taper function is the minimum sidelobe 3-term cosine:
           0.4243801 + 0.4973406*cos(PI*x) + 0.0782793*cos(2*PI*x)
         If you set this environment variable to 'Hamming', then the
         minimum sidelobe 2-term cosine will be used instead:
           0.53836 + 0.46164*cos(PI*x)
         Here, 'x' is between 0 and 1, where x=0 is the center of the
         interpolation mask and x=1 is the outer edge.
     ++  Unfortunately, the 3-term cosine doesn't have a catchy name; you can
           find it (and many other) taper functions described in the paper
             AH Nuttall, Some Windows with Very Good Sidelobe Behavior.
             IEEE Trans. ASSP, 29:84-91 (1981).
           In particular, see Fig.14 and Eq.36 in this paper.
    
     * AFNI_WSINC5_TAPERCUT lets you choose the start 'x' point for tapering:
         This value should be between 0 and 0.8; for example, 0 means to taper
         all the way from x=0 to x=1 (maximum tapering).  The default value
         is 0.  Setting TAPERCUT to 0.5 (say) means only to taper from x=0.5
         to x=1; thus, a larger value means that fewer points are tapered
         inside the interpolation mask.
    
     * AFNI_WSINC5_RADIUS lets you choose the radius of the tapering window
         (i.e., the interpolation mask region).  This value is an integer
         between 3 and 21.  The default value is 5 (which used to be the
         ONLY value, thus 'wsinc5').  RADIUS is measured in voxels, not mm.
    
     * AFNI_WSINC5_SPHERICAL lets you choose the shape of the mask region.
         If you set this value to 'Yes', then the interpolation mask will be
         spherical; otherwise, it defaults to cubical.
    
     * The Hamming taper function is a little faster than the 3-term function,
         but will have a little more Gibbs phenomenon.
     * A larger TAPERCUT will give a little more Gibbs phenomenon; compute
         speed won't change much with this parameter.
     * Compute time goes up with (at least) the 3rd power of the RADIUS; setting
         RADIUS to 21 will be VERY slow.
     * Visually, RADIUS=3 is similar to quintic interpolation.  Increasing
         RADIUS makes the interpolated images look sharper and more well-
         defined.  However, values of RADIUS greater than or equal to 7 appear
         (to Zhark's eagle eye) to be almost identical.  If you really care,
         you'll have to experiment with this parameter yourself.
     * A spherical mask is also VERY slow, since the cubical mask allows
         evaluation as a tensor product.  There is really no good reason
         to use a spherical mask; I only put it in for fun/experimental purposes.
    ** For most users, there is NO reason to ever use these environment variables
         to modify wsinc5.  You should only do this kind of thing if you have a
         good and articulable reason!  (Or if you really like to screw around.)
    ** The wsinc5 interpolation function is parallelized using OpenMP, which
         makes its usage moderately tolerable.
    
    ===========================================================================
    
    Hidden experimental cost functionals:
    -------------------------------------
       sp   *OR*  spearman        = Spearman [rank] Correlation
       je   *OR*  jointentropy    = Joint Entropy [H(b,s)]
       lss  *OR*  signedPcor      = Signed Pearson Correlation
    
    Notes for the new [Feb 2010] lpc+ cost functional:
    --------------------------------------------------
     * The cost functional named 'lpc+' is a combination of several others:
         lpc + hel*0.4 + crA*0.4 + nmi*0.2 + mi*0.2 + ov*0.4
       ++ 'hel', 'crA', 'nmi', and 'mi' are the histogram-based cost
          functionals also available as standalone options.
       ++ 'ov' is a measure of the overlap of the automasks of the base and
          source volumes; ov is not available as a standalone option.
    
     * The purpose of lpc+ is to avoid situations where the pure lpc cost
       goes wild; this especially happens if '-source_automask' isn't used.
       ++ Even with lpc+, you should use '-source_automask+2' (say) to be safe.
    
     * You can alter the weighting of the extra functionals by giving the
       option in the form (for example)
         '-lpc+hel*0.5+nmi*0+mi*0+crA*1.0+ov*0.5'
    
     * The quotes are needed to prevent the shell from wild-card expanding
       the '*' character.
       --> You can now use ':' in place of '*' to avoid this wildcard problem:
             -lpc+hel:0.5+nmi:0+mi:0+crA:1+ov:0.5+ZZ
    
     * Notice the weight factors FOLLOW the name of the extra functionals.
       ++ If you want a weight to be 0 or 1, you have to provide for that
          explicitly -- if you leave a weight off, then it will get its
          default value!
       ++ The order of the weight factor names is unimportant here:
            '-lpc+hel*0.5+nmi*0.8' == '-lpc+nmi*0.8+hel*0.5'
    
     * Only the 5 functionals listed (hel,crA,nmi,mi,ov) can be used in '-lpc+'.
    
     * In addition, if you want the initial alignments to be with '-lpc+' and
       then finish the Final alignment with pure '-lpc', you can indicate this
       by putting 'ZZ' somewhere in the option string, as in '-lpc+ZZ'.
     ***** '-cost lpc+ZZ' is very useful for aligning EPI to T1w volumes *****
    
     * [28 Nov 2018]
       All of the above now applies to the 'lpa+' cost functional,
       which can be used as a robust method for like-to-like alignment.
       For example, aligning 3T and 7T T1-weighted datasets from the same person.
     * [28 Sep 2021]
       However, the default multiplier constants for cost 'lpa+' are now
       different from the 'lpc+' multipliers -- to make 'lpa+' more
       robust. The new default for 'lpa+' is
         lpa + hel*0.4 + crA*0.4 + nmi*0.2 + mi*0.0 + ov*0.4
     ***** '-cost lpa+ZZ' is very useful for T1w to T1w volumes (or any     *****
     ***** similar-contrast datasets).                                      *****
    
     *** Note that in trial runs, we have found that lpc+ZZ and lpa+ZZ are    ***
     *** more robust than lpc+ and lpa+ -- which is why the '+ZZ' amendment   ***
     *** was created.                                                         ***
    
    Cost functional descriptions (for use with -allcost output):
    ------------------------------------------------------------
       ls  :: 1 - abs(Pearson correlation coefficient)
       sp  :: 1 - abs(Spearman correlation coefficient)
       mi  :: - Mutual Information = H(base,source)-H(base)-H(source)
       crM :: 1 - abs[ CR(base,source) * CR(source,base) ]
       nmi :: 1/Normalized MI = H(base,source)/[H(base)+H(source)]
       je  :: H(base,source) = joint entropy of image pair
       hel :: - Hellinger distance(base,source)
       crA :: 1 - abs[ CR(base,source) + CR(source,base) ]
       crU :: CR(source,base) = Var(source|base) / Var(source)
       lss :: Pearson correlation coefficient between image pair
       lpc :: nonlinear average of Pearson cc over local neighborhoods
       lpa :: 1 - abs(lpc)
       lpc+:: lpc + hel + mi + nmi + crA + overlap
       lpa+:: lpa + hel +      nmi + crA + overlap
    
     * N.B.: Some cost functional values (as printed out above)
       are negated from their theoretical descriptions (e.g., 'hel')
       so that the best image alignment will be found when the cost
       is minimized.  See the descriptions above and the references
       below for more details for each functional.
    
     * MY OPINIONS:
       * Some of these cost functionals were implemented only for
          the purposes of fun and/or comparison and/or experimentation
          and/or special circumstances. These are
            sp je lss crM crA crM hel mi nmi
       * For many purposes, lpc+ZZ and lpa+ZZ are the most robust
          cost functionals, but usually the slowest to evaluate.
       * HOWEVER, just because some method is best MOST of the
          time does not mean it is best ALL of the time.
          Please check your results visually, or at some point
          in time you will have bad results and not know it!
       * For speed and for 'like-to-like' alignment, '-cost ls'
          can work well.
    
     * For more information about the 'lpc' functional, see
         ZS Saad, DR Glen, G Chen, MS Beauchamp, R Desai, RW Cox.
           A new method for improving functional-to-structural
           MRI alignment using local Pearson correlation.
           NeuroImage 44: 839-848, 2009.
         http://dx.doi.org/10.1016/j.neuroimage.2008.09.037
         https://pubmed.ncbi.nlm.nih.gov/18976717
       The '-blok' option can be used to control the regions
       (size and shape) used to compute the local correlations.
     *** Using the 'lpc' functional wisely requires the use of
         a proper weight volume.  We HIGHLY recommend you use
         the align_epi_anat.py script if you want to use this
         cost functional!  Otherwise, you are likely to get
         less than optimal results (and then swear at us unjustly).
    
     * For more information about the 'cr' functionals, see
         http://en.wikipedia.org/wiki/Correlation_ratio
       Note that CR(x,y) is not the same as CR(y,x), which
       is why there are symmetrized versions of it available.
    
     * For more information about the 'mi', 'nmi', and 'je'
       cost functionals, see
         http://en.wikipedia.org/wiki/Mutual_information
         http://en.wikipedia.org/wiki/Joint_entropy
         http://www.cs.jhu.edu/~cis/cista/746/papers/mutual_info_survey.pdf
    
     * For more information about the 'hel' functional, see
         http://en.wikipedia.org/wiki/Hellinger_distance
    
     * Some cost functionals (e.g., 'mi', 'cr', 'hel') are
       computed by creating a 2D joint histogram of the
       base and source image pair.  Various options above
       (e.g., '-histbin', etc.) can be used to control the
       number of bins used in the histogram on each axis.
       (If you care to control the program in such detail!)
    
     * Minimization of the chosen cost functional is done via
       the NEWUOA software, described in detail in
         MJD Powell. 'The NEWUOA software for unconstrained
           optimization without derivatives.' In: GD Pillo,
           M Roma (Eds), Large-Scale Nonlinear Optimization.
           Springer, 2006.
         http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2004_08.pdf
    
    ===========================================================================
    
    SUMMARY of the Default Allineation Process
    ------------------------------------------
    As mentioned earlier, each of these steps was added to deal with a problem
     that came up over the years. The resulting process is reasonably robust :),
     but then tends to be slow :(. If you use the '-verb' or '-VERB' option, you
     will get a lot of fun fun fun progress messages that show the results from
     this sequence of steps.
    
    Below, I refer to different scales of effort in the optimizations at each
     step. Easier/faster optimization is done using: matching with fewer points
     from the datasets; more smoothing of the base and source datasets; and by
     putting a smaller upper limit on the number of trials the optimizer is
     allowed to take. The Coarse phase starts with the easiest optimization,
     and increases the difficulty a little at each refinement. The Fine phase
     starts with the most difficult optimization setup: the most points for
     matching, little or no smoothing, and a large limit on the number of
     optimizer trials.
    
     0. Preliminary Setup [Goal: create the basis for the following steps]
      a. Create the automask and/or autoweight from the '-base' dataset.
         The cost functional will only be computed from voxels inside the
         automask, and only a fraction of those voxels will actually be used
         for evaluating the cost functional (unless '-nmatch 100%' is used).
      b. If the automask is 'too close' to the outside of the base 3D volume,
         zeropad the base dataset to avoid edge effects.
      c. Determine the 3D (x,y,z) shifts for the '-cmass' center-of-mass
         crude alignment, if ordered by the user.
      d. Set ranges of transformation parameters and which parameters are to
         be frozen at fixed values.
    
     1. Coarse Phase [Goal: explore the vastness of 6-12D parameter space]
      a. The first step uses only the first 6 parameters (shifts + rotations),
         and evaluates thousands of potential starting points -- selected from
         a 6D grid in parameter space and also from random points in 6D
         parameter space. This step is fairly slow. The best 45 parameter
         sets (in the sense of the cost functional) are kept for the next step.
      b. Still using only the first 6 parameters, the best 45 sets of parameters
         undergo a little optimization. The best 6 parameter sets after this
         refinement are kept for the next step. (The number of sets chosen
         to go on to the next step can be set by the '-twobest' option.)
         The optimizations in this step use the blurring radius that is
         given by option '-twoblur', which defaults to 7.77 mm, and use
         relatively few points in each dataset for computing the cost functional.
      c. These 6 best parameter sets undergo further, more costly, optimization,
         now using all 12 parameters. This optimization runs in 3 passes, each
         more costly (less smoothing, more matching points) than the previous.
         (If 2 sets get too close in parameter space, 1 of them will be cast out
         -- this does not happen often.) Output parameter sets from the 3rd pass
         of successive refinement are inputs to the fine refinement phase.
    
     2. Fine Phase [Goal: use more expensive optimization on good starting points]
      a. The 6 outputs from step 1c have the null parameter set (all 0, except
         for the '-cmass' shifts) appended. Then a small amount of optimization
         is applied to each of these 7 parameter sets ('-num_rtb'). The null
         parameter set is added here to insure against the possibility that the
         coarse optimizations 'ran away' to some unpleasant locations in the 12D
         parameter space. These optimizations use the full set of points specified
         by '-nmatch', and the smoothing specified by '-fineblur' (default = 0),
         but the number of functional evaluations is small, to make this step fast.
      b. The best (smallest cost) set from step 2a is chosen for the final
         optimization, which is run until the '-conv' limit is reached.
         These are the 'Finalish' parameters (shown using '-verb').
      c. The set of parameters from step 2b is used as the starting point
         for a new optimization, in an attempt to avoid a false minimum.
         The results of this optimization are the final parameter set.
    
     3. The final set of parameters is used to produce the output volume,
        using the '-final' interpolation method.
    
    In practice, the output from the Coarse phase successive refinements is
    usually so good that the Fine phase runs quickly and makes only small
    adjustments. The quality resulting from the Coarse phase steps is mostly
    due, in my opinion, to the large number of initial trials (1ab), followed by
    by the successive refinements of several parameter sets (1c) to help usher
    'good' candidates to the starting line for the Fine phase.
    
    For some 'easy' registration problems -- such as T1w-to-T1w alignment, high
    quality images, a lot of overlap to start with -- the process can be sped
    up by reducing the number of steps. For example, '-num_rtb 0 -twobest 0'
    would eliminate step 2a and speed up step 1c. Even more extreme, '-onepass'
    could be used to skip all of the Coarse phase. But be careful out there!
    
    For 'hard' registration problems, cleverness is usually needed. Choice
    of cost functional matters. Preprocessing the datasets may be necessary.
    Using '-twobest 29' could help by providing more candidates for the
    Fine phase -- at the cost of CPU time. If you run into trouble -- which
    happens sooner or later -- try the AFNI Message Board -- and please
    give details, including the exact command line(s) you used.
    
     =========================================================================
    * This binary version of 3dAllineate is compiled using OpenMP, a semi-
       automatic parallelizer software toolkit, which splits the work across
       multiple CPUs/cores on the same shared memory computer.
    * OpenMP is NOT like MPI -- it does not work with CPUs connected only
       by a network (e.g., OpenMP doesn't work across cluster nodes).
    * For some implementation and compilation details, please see
       https://afni.nimh.nih.gov/pub/dist/doc/misc/OpenMP.html
    * The number of CPU threads used will default to the maximum number on
       your system. You can control this value by setting environment variable
       OMP_NUM_THREADS to some smaller value (including 1).
    * Un-setting OMP_NUM_THREADS resets OpenMP back to its default state of
       using all CPUs available.
       ++ However, on some systems, it seems to be necessary to set variable
          OMP_NUM_THREADS explicitly, or you only get one CPU.
       ++ On other systems with many CPUS, you probably want to limit the CPU
          count, since using more than (say) 16 threads is probably useless.
    * You must set OMP_NUM_THREADS in the shell BEFORE running the program,
       since OpenMP queries this variable BEFORE the program actually starts.
       ++ You can't usefully set this variable in your ~/.afnirc file or on the
          command line with the '-D' option.
    * How many threads are useful? That varies with the program, and how well
       it was coded. You'll have to experiment on your own systems!
    * The number of CPUs on this particular computer system is ...... 1.
    * The maximum number of CPUs that will be used is now set to .... 1.
    * OpenMP may or may not speed up the program significantly.  Limited
       tests show that it provides some benefit, particularly when using
       the more complicated interpolation methods (e.g., '-cubic' and/or
       '-final wsinc5'), for up to 3-4 CPU threads.
    * But the speedup is definitely not linear in the number of threads, alas.
       Probably because my parallelization efforts were pretty limited.
     =========================================================================
    
    ++ Compile date = Jul 18 2025 {AFNI_25.2.04:linux_ubuntu_24_64}