PTA


             ================== Welcome to PTA ==================
               Program for Profile Tracking Analysis (PTA)
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Version 0.0.5, Oct 11, 2023
Author: Gang Chen (gangchen@mail.nih.gov)
Website - https://afni.nimh.nih.gov/gangchen_homepage
SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA
#+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Introduction
------

 Profile Tracking Analysis (PTA) estimates nonlinear trajectories or profiles
 through smoothing splines. Currently the program PTA only works through a
 command-line scripting mode. Check the examples below: find one close to your
 specific scenario and use it as a template. The underlying theory is covered in
 the following paper:

 Chen, G., Nash, T.A., Cole, K.M., Kohn, P.D., Wei, S.-M., Gregory, M.D.,
 Eisenberg, D.P., Cox, R.W., Berman, K.F., Shane Kippenhan, J., 2021. Beyond
 linearity in neuroimaging: Capturing nonlinear relationships with application to
 longitudinal studies. NeuroImage 233, 117891.
 https://doi.org/10.1016/j.neuroimage.2021.117891

 To be able to run PTA, one needs to have the R packages "mgcv" installed with
 the following command at the terminal:

 rPkgsInstall -pkgs "mgcv"

 Alternatively you may install them in R:

 install.packages("mgcv")

 When a factor (e.g, groups, conditions) is involved, numerical coding is
 required in formulating the data information. See Examples 3 and 4. The
 following website provides some explanations regarding factor coding that
 might be useful for modeling formulation:

 https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/

 There are two output files generated by PTA: one (with the affix -stat.txt)
 contains the information about the statistical evidence for various effects
 while the other (with the affix -prediction.txt) tabulates the predicted
 values and their standard errors which can be utilized to illustrate the
 inferred trajectories or trends (e.g., using graphical tools such as ggplot2
 in R).


Example 1 --- simplest case: one group of subjects with a between-subject
  quantitative variable that does not vary within subject. Analysis is
  set up to model the trajectory or trend along age:

   PTA -prefix age         \
       -input data.txt     \
       -model 's(age)'     \
       -Y  height          \
       -prediction pred.txt

  The function 's(age)' indicates that 'age' is modeled via a smooth curve.
  No empty space is allowed in the model formulation.

   The file pred.txt lists all the explanatory variables (excluding lower-level variables
   such as subject) for prediction. The file should be in a data.frame format as below:

    age
    10
    12
    14

    20
    22
    24
    ...

   The age step in the above example is 2 years. To obtain smoother graphical appearance
   in plotted profiles, one can set the age values in pred.txt with a small grid sizer of,
   for example, 0.5.

   The file data.txt stores the information for all the variables and input data in a
   data.frame format as below:

   Subj   age   height
   S1      24   175
   S2      14   163
    ...

   The subject labels in the above table can be characters or mixtures of characters
   and numbers, but they cannot be pure numbers.

  There will be two output files, one age-stat.txt and the other age-prediction.txt:
  the former shows the statistical evidence; the latter contains a predicted value
  for each age plus the associated uncertainty (standard error),  which can be
  plotted using tools such as ggplot2.


Example 2 --- Largely same as Example 1, but with 'age' as a within-subject
  quantitative variable (varying within each subject). The model is now
  specified by replacing the line of -model in Example 1 with the following
  two lines:

          -model 's(age)+s(Subj,bs="re")'         \
          -vt Subj 's(Subj)'                      \

  The second term 's(Subj,bs="re")' in the model specification means that
  each subject is allowed to have a varying intercept or random effect ('re').
  To estimate the smooth trajectory through the option -prediction, the option
  -vt has to be included in this case to indicate the varying term (usually
  subjects). That is, if prediction is desirable, one has to explicitly
  declare the variable (e.g., Subj) that is associated with the varying term
  (e.g., s(Subj)). No empty space is allowed in the model formulation and the
  the varying term.

  The full script version is

   PTA -prefix age2                        \
       -input data.txt                     \
       -model 's(age)+s(Subj,bs="re")'     \
       -vt Subj 's(Subj)'                  \
       -prediction pred.txt

  All the rest remains the same as Example 1.


Example 3 --- two groups and one quantitative variable (age). The analysis is
  set up to compare the trajectory or trend along age between the two groups,
  which are quantitatively coded as -1 and 1. For example, if the two groups
  are females and males, you can code females as -1 and males as 1. The following
  script applies to the situation when the quantitative variable age does not vary
  within subject,

  PTA -prefix age3a                      \
      -input data.txt                   \
      -model 's(age)+s(age,by=MvF)'     \
      -prediction pred.txt

  The prediction table in the file data.txt contains the following structure:

  Subj  age grp MvsF
   S1   27   M   1
   S2   21   M   1
   S3   28   F  -1
   S4   18   F  -1
   ...

  The column grp above is not necessary for modeling, but it is included to
  be more indicative for the prediction values in the output file
  age3a-prediction.txt

  Similarly, the prediction file pred.txt looks like (set the age values with
  a small grid so that the graphical illustration would be smooth):

  age grp MvsF
  10   M   1
  12   M   1
  ...
  28   M   1
  30   M   1
  10   F  -1
  12   F  -1
  ...
  28   F  -1
  30   F  -1

  Note that the age values for prediction have a gap of 2 years: The a smaller
  the gap, the smoother the plotted predictions.

  On the other hand, go with the script below when the quantitative variable age
  varies within subject,

  PTA -prefix age3b                                   \
      -input data.txt                                \
      -model 's(age)+s(age,by=grp)+s(Subj,bs="re")'  \
      -vt  Subj 's(Subj)'                            \
      -prediction pred.txt


Example 4 --- This example demonstrates the situations where more than two
  levels are involved in a between-individual factor. Suppose that
  three groups and one quantitative variable (age). The analysis is
  set up to compare the trajectory or trend along age between the three groups,
  A, B and C that are quantitatively represented using dummy coding.

  PTA -prefix age4a                      \
      -input data.txt                   \
      -model 's(age)+s(age,by=AvC)+s(age,by=BvC)'     \
      -prediction pred.txt

  The input table in the file data.txt contains the following structure:

  Subj   age grp AvsC BvC
   S1    27   A   1    0
   S2    21   A   1    0
   S3    17   B   0    1
   S4    24   B   0    1
   S5    28   C   0    0
   S6    18   C   0    0
  ...

  The column grp above is not necessary for modeling, but it is included to
  be more indicative for the prediction values in the output file
  age4a-prediction.txt

  On the other hand, go with the script below when the quantitative variable age
  varies within subject,

  PTA -prefix age4b                                   \
      -input data.txt                                \
      -model 's(age)+s(age,by=AvC)+s(age,by=BvC)+s(Subj,bs="re")'  \
      -vt  Subj 's(Subj)'                            \
      -prediction pred.txt


Example 5 --- Suppose tht we compare the profiles between two conditions
  across space or time that is expreessed as a variable x. In this case
  profile estimation and statistical inference are separated into two steps.
  First, estimate the profile for each condition using Example 1 or Example 2
  as a template. Then, make inference about the contrast between the two
  conditions. Obtain the contrast at each value of x for each individual, and
  use the difference values as input. Specify the model as below if there are
  multiple individuals:

          -model 's(x)+s(id,bs="re")'         \
          -vt id 's(id)'                      \
  For one individual, change the model to

          -model 's(x)'                      \


Options in alphabetical order:
------------------------------

   -dbgArgs: This option will enable R to save the parameters in a
         file called .PTA.dbg.AFNI.args in the current directory
          so that debugging can be performed.

   -h: this help message

   -help: this help message

   -input file: input file in a table format (sames as the data frame structure of long format in R. Use the first row to specify the column names. The subject column, if applicable, should not be purely numbers. On the other hand, factors (groups, tasks) should be numerically coded using convenient coding methods such as deviation or dummy coding.

   -interactive: Currently unavailable.

   -model FORMULA: Specify the model formulation through multilevel smoothing splines
         expression FORMULA with more than one variable has to be surrounded within
         (single or double) quotes. Variable names in the formula should be
         consistent with the ones used in the header of the input file.
         The nonlinear trajectory is specified through the expression of s(x,k=?)
         where s() indicates a smooth function, x is a quantitative variable with
         which one would like to trace the trajectory and k is the number of smooth
         splines (knots). The default (when k is missing) for k is 10, which is good
         enough most of the time when there are more than 10 data points of x. When
         there are less than 10 data points of x, choose a value of k slightly less
         than the number of data points.

   -prediction TABLE: Provide a data table so that predicted values could be generated for
graphical illustration. Usually the table should contain similar structure as the input
file except that columns for those varying smoothing terms (e.g., subject) and response
variable (i.e., Y) should not be included. Try to specify equally-spaced values with a small
for the quantitative variable of modeled trajectory (e.g., age) so that smooth curves could
be plotted after the analysis. See Examples in the help for a couple of specific tables used
for predictions.

   -prefix PREFIX: Prefix for output files.

   -show_allowed_options: list of allowed options

   -verb VERB: VERB is an integer specifying verbosity level.
             0 for quiet (Default). 1 or more: talkative.

   -vt var formulation: This option is for specifying varying smoothing terms. Two components
         are required: the first one 'var' indicates the variable (e.g., subject) around
         which the smoothing will vary while the second component specifies the smoothing
         formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g.,
         no within-subject variables), do not use this option.

   -Y var_name: var_name is used to specify the column name that is designated as
        as the response/outcome variable. The default (when this option is not
        invoked) is 'Y'.