# 3dMSS¶

```
================== Welcome to 3dMSS ==================
Program for Voxelwise Multilevel Smoothing Spline (MSS) Analysis
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version 0.0.8, April 18, 2021
Author: Gang Chen (gangchen@mail.nih.gov)
Website - https://afni.nimh.nih.gov/gangchen_homepage
SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Introduction
------
Multilevel Smoothing-Spline (MSS) Modeling
The linearity assumption surrounding a quantitative variable in common
practice may be a reasonable approximation especially when the variable
is confined within a narrow range, but can be inappropriate under some
circumstances when the variable's effect is non-monotonic or tortuous.
As a more flexible and adaptive approach, multilevel smoothing splines
(MSS) offers a more powerful analytical tool for population-level
neuroimaging data analysis that involves one or more quantitative
predictors. More theoretical discussion can be found in
Chen et al. (2020). Beyond linearity: Capturing nonlinear relationships
in neuroimaging. https://doi.org/10.1101/2020.11.01.363838
To be able to run 3dMSS, one needs to have the following R packaages
installed: "gamm4" and "snow". To install these R packages, run the
following command at the terminal:
rPkgsInstall -pkgs "gamm4,snow"
Alternatively you may install them in R:
install.packages("gamm4")
install.packages("snow")
It is best to go through all the examples below to get hang of the MSS
scripting interface. Once the 3dMSS script is constructed, it can be run
by copying and pasting to the terminal. Alternatively (and probably better)
you save the script as a text file, for example, called MSS.txt, and execute
it with the following (assuming on tc shell),
nohup tcsh -x MSS.txt &
or,
nohup tcsh -x MSS.txt > diary.txt &
or,
nohup tcsh -x MSS.txt |& tee diary.txt &
The advantage of the latter commands is that the progression is saved into
the text file diary.txt and, if anything goes awry, can be examined later.
Example 1 --- simplest case: one group of subjects with a between-subject
quantitative variable that does not vary within subject. MSS analysis is
set up to model the trajectory or trend along age, and can be specified
through the option -mrr, which is solved via a model formuation of ridge
regression. Again, the following exemplary script assumes that 'age' is
a between-subjects variable (not varying within subject):
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
The function 's(age)' indicates that 'age' is modeled via a smooth curve.
No empty space is allowed in the model formulation. With the option
-bounds, values beyond [-2, 2] will be treated as outliers and considered
as missing. If you want to set a range, choose one that make sense with
your specific input data.
The file pred.txt lists all the expl1anatory variables (excluding lower-level variables
such as subject) for prediction. The file should be in a data.frame format as below:
label age
t1 1
t2 2
t3 3
...
t8 8
t9 9
t10 10
...
The file data.txt stores the information for all the variables and input data in a
data.frame format. For example:
Subj age InputFile
S1 1 ~/alex/MSS/S1.nii
S2 2 ~/alex/MSS/S2.nii
...
In the output the first sub-brick shows the statistical evidence in the
form of chi-square distribution with 2 degrees of freedom (2 DFs do not mean
anything, just for the convenience of information coding). This sub-brick is
the statistical evidence for the trejectory of the group. If you want to
estimate the trend at the population level, use the option -prediction with a
table that codes the ages you would like to track the trend. In the output
there is one predicted value for each age plus the associated uncertainty
(standard error). For example, with 10 age values, there will be 10 predicted
values plus 10 standard errors. The sub-bricks for prediction and standard
errors are interleaved.
Example 2 --- Largely same as Example 1, but with 'age' as a within-subject
quantitative variable (varying within each subject). The model is better
specified by replacing the line of -mrr in Example 1 with the following
two lines:
-mrr 's(age)+s(Subj,bs="re")' \
-vt Subj 's(Subj)' \
The second term 's(Subj,bs="re")' in the model specification means that
each subject is allowed to have a varying intercept or random effect ('re').
To estimate the smooth trajectory through the option -prediction, the option
-vt has to be included in this case to indicate the varying term (usually
subjects). That is, if prediction is desirable, one has to explicitly
declare the variable (e.g., Subj) that is associated with the varying term
(e.g., s(Subj)). No empty space is allowed in the model formulation and the
the varying term.
The full script version is
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age)+s(Subj,bs="re")' \
-vt Subj 's(Subj)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
All the rest remains the same as Example 1.
Alternatively, this model with varying subject-level intercept can be
specified with
-lme 's(age)' \
-ranEff 'list(Subj=~1)' \
which is solved through the linear mixed-effect (lme) platform. The -vt is
not needed when making prediction through the option -prediction. The two
specifications, -mrr and -lme, would render similar results, but the
runtime may differ depending on the amount of data and model complexity.
Example 3 --- two groups and one quantitative variable (age). MSS analysis is
set up to compare the trajectory or trend along age between the two groups,
which are quantitatively coded as -1 and 1. For example, if the two groups
are females and males, you can code females as -1 and males as 1. The following
script applies to the situation when the quantitative variable does not vary
within subject,
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age)+s(age,by=grp)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
On the other hand, go with the script below when the quantitative variable
varies within subject,
3dMSS -prefix MSS -jobs 16 \
-mrr 's(age)+s(age,by=grp)+s(Subj,bs="re")' \
-vt Subj 's(Subj)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
or an LME version:
3dMSS -prefix MSS -jobs 16 \
-lme 's(age)+s(age,by=grp)' \
-ranEff 'list(Subj=~1)' \
-qVars 'age' \
-mask myMask.nii \
-bounds -2 2 \
-prediction @pred.txt \
-dataTable @data.txt
Options in alphabetical order:
------------------------------
-bounds lb ub: This option is for outlier removal. Two numbers are expected from
the user: the lower bound (lb) and the upper bound (ub). The input data will
be confined within [lb, ub]: any values in the input data that are beyond
the bounds will be removed and treated as missing. Make sure the first number
less than the second. You do not have to use this option to censor your data!
-cio: Use AFNI's C io functions, which is default. Alternatively -Rio
can be used.
-dataTable TABLE: List the data structure with a header as the first line.
NOTE:
1) This option has to occur last in the script; that is, no other
options are allowed thereafter. Each line should end with a backslash
except for the last line.
2) The order of the columns should not matter except that the last
column has to be the one for input files, 'InputFile'. Each row should
contain only one input file in the table of long format (cf. wide format)
as defined in R. Input files can be in AFNI, NIfTI or surface format.
AFNI files can be specified with sub-brick selector (square brackets
[] within quotes) specified with a number or label.
3) It is fine to have variables (or columns) in the table that are
not modeled in the analysis.
4) When the table is part of the script, a backslash is needed at the end
of each line to indicate the continuation to the next line. Alternatively,
one can save the context of the table as a separate file, e.g.,
calling it table.txt, and then in the script specify the data with
'-dataTable @table.txt'. However, when the table is provided as a separate
file, do NOT put any quotes around the square brackets for each sub-brick,
otherwise the program would not properly read the files, unlike the
situation when quotes are required if the table is included as part of the
script. Backslash is also not needed at the end of each line, but it would
not cause any problem if present. This option of separating the table from
the script is useful: (a) when there are many input files so that
the program complains with an 'Arg list too long' error; (b) when
you want to try different models with the same dataset.
-dbgArgs: This option will enable R to save the parameters in a
file called .3dMSS.dbg.AFNI.args in the current directory
so that debugging can be performed.
-fixEff FORMULA: Specify the fixed effect components of the model. The
expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
In the MSS context the simplest model is "1+(1|Subj1)+(1|Subj2)" in
which the random effect from each of the two subjects in a pair is
symmetrically incorporated in the model. Each random-effects factor is
specified within paratheses per formula convention in R. Any
effects of interest and confounding variables (quantitative or
categorical variables) can be added as fixed effects without paratheses.
-help: this help message
-IF var_name: var_name is used to specify the column name that is designated for
input files of effect estimate. The default (when this option is not invoked
is 'InputFile', in which case the column header has to be exactly as 'InputFile'
This input file for effect estimates has to be the last column.
-jobs NJOBS: On a multi-processor machine, parallel computing will speed
up the program significantly.
Choose 1 for a single-processor computer.
-lme FORMULA: Specify the fixed effect components of the model. The
expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
In the MSS context the simplest model is "1+(1|Subj1)+(1|Subj2)" in
which the random effect from each of the two subjects in a pair is
symmetrically incorporated in the model. Each random-effects factor is
specified within paratheses per formula convention in R. Any
effects of interest and confounding variables (quantitative or
categorical variables) can be added as fixed effects without paratheses.
-mask MASK: Process voxels inside this mask only.
Default is no masking.
-mrr FORMULA: Specify the model formulation through multilevel smoothing splines.
Expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
The nonlinear trajectory is specified through the expression of s(x,k=?)
where s() indicates a smooth function, x is a quantitative variable with
which one would like to trace the trajectory and k is the number of smooth
splines (knots). The default (when k is missing) for k is 10, which is good
enough most of the time when there are more than 10 data points of x. When
there are less than 10 data points of x, choose a value of k slightly less
than the number of data points.
-prediction TABLE: Provide a data table so that predicted values could be generated for
graphical illustration. Usually the table should contain similar structure as the input
file except that 1) reserve the first column for effect labels which will be used for
sub-brick names in the output for those predicted values; 2) columns for those varying
smoothing terms (e.g., subject) and response variable (i.e., Y) should not be includes.
Try to specify equally-spaced values with a small for the quantitative variable of
modeled trajectory (e.g., age) so that smooth curves could be plotted after the
analysis. See Examples in the help for a couple of specific tables used for predictions.
-prefix PREFIX: Output file name. For AFNI format, provide prefix only,
with no view+suffix needed. Filename for NIfTI format should have
.nii attached (otherwise the output would be saved in AFNI format).
-qVars variable_list: Identify quantitative variables (or covariates) with
this option. The list with more than one variable has to be
separated with comma (,) without any other characters such as
spaces and should be surrounded within (single or double) quotes.
For example, -qVars "Age,IQ"
WARNINGS:
1) Centering a quantitative variable through -qVarsCenters is
very critical when other fixed effects are of interest.
2) Between-subjects covariates are generally acceptable.
However EXTREME caution should be taken when the groups
differ substantially in the average value of the covariate.
-ranEff FORMULA: Specify the random effect components of the model. The
expression FORMULA with more than one variable has to be surrounded
within (single or double) quotes. Variable names in the formula
should be consistent with the ones used in the header of -dataTable.
In the MSS context the simplest model is "list(Subj=~1)" in which the
varying or random effect from each subject is incorporated in the model.
Each random-effects factor is
specified within paratheses per formula convention in R. Any
effects of interest and confounding variables (quantitative or
categorical variables) can be added as fixed effects without paratheses.
-Rio: Use R's io functions. The alternative is -cio.
-show_allowed_options: list of allowed options
-vt var formulation: This option is for specifying varying smoothing terms. Two components
are required: the first one 'var' indicates the varaible (e.g., subject) around
which the smoothing will vary while the second component specifies the smoothing
formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g.,
no within-subject variables), do not use this option.
```