3dMSS


             ================== Welcome to 3dMSS ==================
       Program for Voxelwise Multilevel Smoothing Spline (MSS) Analysis
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version 0.0.8, April 18, 2021
Author: Gang Chen (gangchen@mail.nih.gov)
Website - https://afni.nimh.nih.gov/gangchen_homepage
SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Introduction
------

 Multilevel Smoothing-Spline (MSS) Modeling

 The linearity assumption surrounding a quantitative variable in common
 practice may be a reasonable approximation especially when the variable
 is confined within a narrow range, but can be inappropriate under some
 circumstances when the variable's effect is non-monotonic or tortuous.
 As a more flexible and adaptive approach, multilevel smoothing splines
 (MSS) offers a more powerful analytical tool for population-level
 neuroimaging data analysis that involves one or more quantitative
 predictors. More theoretical discussion can be found in

 Chen et al. (2020). Beyond linearity: Capturing nonlinear relationships
 in neuroimaging. https://doi.org/10.1101/2020.11.01.363838

 To be able to run 3dMSS, one needs to have the following R packaages
 installed: "gamm4" and "snow". To install these R packages, run the
 following command at the terminal:

 rPkgsInstall -pkgs "gamm4,snow"

 Alternatively you may install them in R:

 install.packages("gamm4")
 install.packages("snow")

 It is best to go through all the examples below to get hang of the MSS
 scripting interface. Once the 3dMSS script is constructed, it can be run
 by copying and pasting to the terminal. Alternatively (and probably better)
 you save the script as a text file, for example, called MSS.txt, and execute
 it with the following (assuming on tc shell),

 nohup tcsh -x MSS.txt &

 or,

 nohup tcsh -x MSS.txt > diary.txt &

 or,

 nohup tcsh -x MSS.txt |& tee diary.txt &

 The advantage of the latter commands is that the progression is saved into
 the text file diary.txt and, if anything goes awry, can be examined later.

Example 1 --- simplest case: one group of subjects with a between-subject
  quantitative variable that does not vary within subject. MSS analysis is
  set up to model the trajectory or trend along age, and can be specified
  through the option -mrr, which is solved via a model formuation of ridge
  regression. Again, the following exemplary script assumes that 'age' is
  a between-subjects variable (not varying within subject):

   3dMSS -prefix MSS -jobs 16                     \
          -mrr 's(age)'                           \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  The function 's(age)' indicates that 'age' is modeled via a smooth curve.
  No empty space is allowed in the model formulation. With the option
  -bounds, values beyond [-2, 2] will be treated as outliers and considered
  as missing. If you want to set a range, choose one that make sense with
  your specific input data.

   The file pred.txt lists all the expl1anatory variables (excluding lower-level variables
   such as subject) for prediction. The file should be in a data.frame format as below:

   label  age
   t1      1
   t2      2
   t3      3
    ...
   t8      8
   t9      9
   t10    10
    ...

   The file data.txt stores the information for all the variables and input data in a
   data.frame format. For example:

   Subj   age   InputFile
   S1      1   ~/alex/MSS/S1.nii
   S2      2   ~/alex/MSS/S2.nii
    ...

  In the output the first sub-brick shows the statistical evidence in the
  form of chi-square distribution with 2 degrees of freedom (2 DFs do not mean
  anything, just for the convenience of information coding). This sub-brick is
  the statistical evidence for the trejectory of the group. If you want to
  estimate the trend at the population level, use the option -prediction with a
  table that codes the ages you would like to track the trend. In the output
  there is one predicted value for each age plus the associated uncertainty
  (standard error). For example, with 10 age values, there will be 10 predicted
  values plus 10 standard errors. The sub-bricks for prediction and standard
  errors are interleaved.


Example 2 --- Largely same as Example 1, but with 'age' as a within-subject
  quantitative variable (varying within each subject). The model is better
  specified by replacing the line of -mrr in Example 1 with the following
  two lines:

          -mrr 's(age)+s(Subj,bs="re")'         \
          -vt Subj 's(Subj)'                      \

  The second term 's(Subj,bs="re")' in the model specification means that
  each subject is allowed to have a varying intercept or random effect ('re').
  To estimate the smooth trajectory through the option -prediction, the option
  -vt has to be included in this case to indicate the varying term (usually
  subjects). That is, if prediction is desirable, one has to explicitly
  declare the variable (e.g., Subj) that is associated with the varying term
  (e.g., s(Subj)). No empty space is allowed in the model formulation and the
  the varying term.

  The full script version is

   3dMSS -prefix MSS -jobs 16                     \
          -mrr 's(age)+s(Subj,bs="re")'         \
          -vt Subj 's(Subj)'                      \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  All the rest remains the same as Example 1.

  Alternatively, this model with varying subject-level intercept can be
  specified with

          -lme 's(age)'                        \
          -ranEff 'list(Subj=~1)'                      \

  which is solved through the linear mixed-effect (lme) platform. The -vt is
  not needed when making prediction through the option -prediction. The two
  specifications, -mrr and -lme, would render similar results, but the
  runtime may differ depending on the amount of data and model complexity.


Example 3 --- two groups and one quantitative variable (age). MSS analysis is
  set up to compare the trajectory or trend along age between the two groups,
  which are quantitatively coded as -1 and 1. For example, if the two groups
  are females and males, you can code females as -1 and males as 1. The following
  script applies to the situation when  the quantitative variable does not vary
  within subject,

  3dMSS -prefix MSS -jobs 16                     \
          -mrr 's(age)+s(age,by=grp)'             \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  On the other hand, go with the script below when the quantitative variable
  varies within subject,

  3dMSS -prefix MSS -jobs 16                     \
          -mrr 's(age)+s(age,by=grp)+s(Subj,bs="re")' \
          -vt  Subj 's(Subj)'                \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt

  or an LME version:

  3dMSS -prefix MSS -jobs 16                     \
          -lme 's(age)+s(age,by=grp)'             \
          -ranEff 'list(Subj=~1)'                      \
          -qVars 'age'                            \
          -mask myMask.nii                        \
          -bounds  -2 2                           \
          -prediction @pred.txt                   \
          -dataTable  @data.txt


Options in alphabetical order:
------------------------------

   -bounds lb ub: This option is for outlier removal. Two numbers are expected from
         the user: the lower bound (lb) and the upper bound (ub). The input data will
         be confined within [lb, ub]: any values in the input data that are beyond
         the bounds will be removed and treated as missing. Make sure the first number
         less than the second. You do not have to use this option to censor your data!

   -cio: Use AFNI's C io functions, which is default. Alternatively -Rio
         can be used.

   -dataTable TABLE: List the data structure with a header as the first line.

         NOTE:

         1) This option has to occur last in the script; that is, no other
         options are allowed thereafter. Each line should end with a backslash
         except for the last line.

         2) The order of the columns should not matter except that the last
         column has to be the one for input files, 'InputFile'. Each row should
         contain only one input file in the table of long format (cf. wide format)
         as defined in R. Input files can be in AFNI, NIfTI or surface format.
         AFNI files can be specified with sub-brick selector (square brackets
         [] within quotes) specified with a number or label.

         3) It is fine to have variables (or columns) in the table that are
         not modeled in the analysis.

         4) When the table is part of the script, a backslash is needed at the end
         of each line to indicate the continuation to the next line. Alternatively,
         one can save the context of the table as a separate file, e.g.,
         calling it table.txt, and then in the script specify the data with
         '-dataTable @table.txt'. However, when the table is provided as a separate
         file, do NOT put any quotes around the square brackets for each sub-brick,
         otherwise the program would not properly read the files, unlike the
         situation when quotes are required if the table is included as part of the
         script. Backslash is also not needed at the end of each line, but it would
         not cause any problem if present. This option of separating the table from
         the script is useful: (a) when there are many input files so that
         the program complains with an 'Arg list too long' error; (b) when
         you want to try different models with the same dataset.

   -dbgArgs: This option will enable R to save the parameters in a
         file called .3dMSS.dbg.AFNI.args in the current directory
          so that debugging can be performed.

   -fixEff FORMULA: Specify the fixed effect components of the model. The
         expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         In the MSS context the simplest model is "1+(1|Subj1)+(1|Subj2)" in
         which the random effect from each of the two subjects in a pair is
         symmetrically incorporated in the model. Each random-effects factor is
         specified within paratheses per formula convention in R. Any
         effects of interest and confounding variables (quantitative or
         categorical variables) can be added as fixed effects without paratheses.

   -help: this help message

   -IF var_name: var_name is used to specify the column name that is designated for
        input files of effect estimate. The default (when this option is not invoked
        is 'InputFile', in which case the column header has to be exactly as 'InputFile'
        This input file for effect estimates has to be the last column.

   -jobs NJOBS: On a multi-processor machine, parallel computing will speed
         up the program significantly.
         Choose 1 for a single-processor computer.

   -lme FORMULA: Specify the fixed effect components of the model. The
         expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         In the MSS context the simplest model is "1+(1|Subj1)+(1|Subj2)" in
         which the random effect from each of the two subjects in a pair is
         symmetrically incorporated in the model. Each random-effects factor is
         specified within paratheses per formula convention in R. Any
         effects of interest and confounding variables (quantitative or
         categorical variables) can be added as fixed effects without paratheses.

   -mask MASK: Process voxels inside this mask only.
          Default is no masking.

   -mrr FORMULA: Specify the model formulation through multilevel smoothing splines.
         Expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         The nonlinear trajectory is specified through the expression of s(x,k=?)
         where s() indicates a smooth function, x is a quantitative variable with
         which one would like to trace the trajectory and k is the number of smooth
         splines (knots). The default (when k is missing) for k is 10, which is good
         enough most of the time when there are more than 10 data points of x. When
         there are less than 10 data points of x, choose a value of k slightly less
         than the number of data points.

   -prediction TABLE: Provide a data table so that predicted values could be generated for
graphical illustration. Usually the table should contain similar structure as the input
file except that 1) reserve the first column for effect labels which will be used for
sub-brick names in the output for those predicted values; 2) columns for those varying
smoothing terms (e.g., subject) and response variable (i.e., Y) should not be includes.
Try to specify equally-spaced values with a small for the quantitative variable of
modeled trajectory (e.g., age) so that smooth curves could be plotted after the
analysis. See Examples in the help for a couple of specific tables used for predictions.

   -prefix PREFIX: Output file name. For AFNI format, provide prefix only,
         with no view+suffix needed. Filename for NIfTI format should have
         .nii attached (otherwise the output would be saved in AFNI format).

   -qVars variable_list: Identify quantitative variables (or covariates) with
         this option. The list with more than one variable has to be
         separated with comma (,) without any other characters such as
         spaces and should be surrounded within (single or double) quotes.
         For example, -qVars "Age,IQ"
         WARNINGS:
         1) Centering a quantitative variable through -qVarsCenters is
         very critical when other fixed effects are of interest.
         2) Between-subjects covariates are generally acceptable.
         However EXTREME caution should be taken when the groups
         differ substantially in the average value of the covariate.


   -ranEff FORMULA: Specify the random effect components of the model. The
         expression FORMULA with more than one variable has to be surrounded
         within (single or double) quotes. Variable names in the formula
         should be consistent with the ones used in the header of -dataTable.
         In the MSS context the simplest model is "list(Subj=~1)" in which the
         varying or random effect from each subject is incorporated in the model.
         Each random-effects factor is
         specified within paratheses per formula convention in R. Any
         effects of interest and confounding variables (quantitative or
         categorical variables) can be added as fixed effects without paratheses.

   -Rio: Use R's io functions. The alternative is -cio.

   -show_allowed_options: list of allowed options

   -vt var formulation: This option is for specifying varying smoothing terms. Two components
         are required: the first one 'var' indicates the varaible (e.g., subject) around
         which the smoothing will vary while the second component specifies the smoothing
         formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g.,
         no within-subject variables), do not use this option.