:orphan:

.. _ahelp_PTA:

***
PTA
***

.. contents:: :local:


| 

.. code-block:: none

    
                 ================== Welcome to PTA ==================
                   Program for Profile Tracking Analysis (PTA)
    #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    Version 0.0.3, Mar 30, 2021
    Author: Gang Chen (gangchen@mail.nih.gov)
    Website - https://afni.nimh.nih.gov/gangchen_homepage
    SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA
    #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    
    Introduction
    ------
    
     Profile Tracking Analysis (PTA) estimates nonlinear trajectories or profiles
     through smoothing splines. Currently the program PTA only works through a
     command-line scripting mode. Check the examples below: find one close to your
     specific scenario and use it as a template. The underlying theory is covered in
     the following paper:
    
     Chen et al. (2020). Beyond linearity: Capturing nonlinear relationships 
     in neuroimaging. https://doi.org/10.1101/2020.11.01.363838
    
     To be able to run PTA, one needs to have the R packages "mgcv" installed with
     the following command at the terminal:
    
     rPkgsInstall -pkgs "mgcv"
    
     Alternatively you may install them in R:
    
     install.packages("mgcv")
    
     When a factor (e.g, groups, conditions) is involved, numerical coding is
     required in formulating the data information. See Examples 3 and 4. The
     following website provides some explanations regarding factor coding that
     might be useful for modeling formulation:
    
     https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/
    
     There are two output files generated by PTA: one (with the affix -stat.txt)
     contains the information about the statistical evidence for various effects
     while the other (with the affix -prediction.txt) tabulates the predicted
     values and their standard errors which can be utilized to illustrate the
     inferred trajectories or trends (e.g., using graphical tools such as ggplot2
     in R). 
    
    
    Example 1 --- simplest case: one group of subjects with a between-subject 
      quantitative variable that does not vary within subject. Analysis is 
      set up to model the trajectory or trend along age:
    
       PTA -prefix age         \
           -input data.txt     \
           -model 's(age)'     \
           -Y  height          \
           -prediction pred.txt                   
    
      The function 's(age)' indicates that 'age' is modeled via a smooth curve.
      No empty space is allowed in the model formulation.
    
       The file pred.txt lists all the explanatory variables (excluding lower-level variables
       such as subject) for prediction. The file should be in a data.frame format as below:
    
        age 
        10   
        12   
        14   
       
        20  
        22 
        24
        ...
    
       The age step in the above example is 2 years. To obtain smoother graphical appearance
       in plotted profiles, one can set the age values in pred.txt with a small grid sizer of, 
       for example, 0.5.
    
       The file data.txt stores the information for all the variables and input data in a
       data.frame format as below:
    
       Subj   age   height
       S1      24   175
       S2      14   163
        ...
    
       The subject labels in the above table can be characters or mixtures of characters
       and numbers, but they cannot be pure numbers.
    
      There will be two output files, one age-stat.txt and the other age-prediction.txt:
      the former shows the statistical evidence; the latter contains a predicted value
      for each age plus the associated uncertainty (standard error),  which can be
      plotted using tools such as ggplot2.
       
    
    Example 2 --- Largely same as Example 1, but with 'age' as a within-subject 
      quantitative variable (varying within each subject). The model is now 
      specified by replacing the line of -model in Example 1 with the following 
      two lines:
    
              -model 's(age)+s(Subj,bs="re")'         \
              -vt Subj 's(Subj)'                      \
    
      The second term 's(Subj,bs="re")' in the model specification means that
      each subject is allowed to have a varying intercept or random effect ('re'). 
      To estimate the smooth trajectory through the option -prediction, the option
      -vt has to be included in this case to indicate the varying term (usually 
      subjects). That is, if prediction is desirable, one has to explicitly
      declare the variable (e.g., Subj) that is associated with the varying term
      (e.g., s(Subj)). No empty space is allowed in the model formulation and the
      the varying term. 
    
      The full script version is
    
       PTA -prefix age2                        \
           -input data.txt                     \
           -model 's(age)+s(Subj,bs="re")'     \
           -vt Subj 's(Subj)'                  \
           -prediction pred.txt                   
    
      All the rest remains the same as Example 1.
      
    
    Example 3 --- two groups and one quantitative variable (age). The analysis is 
      set up to compare the trajectory or trend along age between the two groups,
      which are quantitatively coded as -1 and 1. For example, if the two groups
      are females and males, you can code females as -1 and males as 1. The following
      script applies to the situation when the quantitative variable age does not vary 
      within subject, 
    
      PTA -prefix age3a                      \
          -input data.txt                   \
          -model 's(age)+s(age,by=MvF)'     \
          -prediction pred.txt             
    
      The prediction table in the file data.txt contains the following structure:
    
      Subj  age grp MvsF
       S1   27   M   1
       S2   21   M   1
       S3   28   F  -1
       S4   18   F  -1
       ...
    
      The column grp above is not necessary for modeling, but it is included to
      be more indicative for the prediction values in the output file
      age3a-prediction.txt
    
      Similarly, the prediction file pred.txt looks like (set the age values with
      a small grid so that the graphical illustration would be smooth):
    
      age grp MvsF
      10   M   1
      12   M   1
      ...
      28   M   1
      30   M   1
      10   F  -1
      12   F  -1
      ...
      28   F  -1
      30   F  -1
    
      Note that the age values for prediction have a gap of 2 years: The a smaller
      the gap, the smoother the plotted predictions.
    
      On the other hand, go with the script below when the quantitative variable age
      varies within subject,
    
      PTA -prefix age3b                                   \
          -input data.txt                                \
          -model 's(age)+s(age,by=grp)+s(Subj,bs="re")'  \
          -vt  Subj 's(Subj)'                            \
          -prediction pred.txt
       
    
    Example 4 --- This example demonstrates the situations where more than two
      levels are involved in a between- or within-subject factor. Suppose that 
      three groups and one quantitative variable (age). The analysis is 
      set up to compare the trajectory or trend along age between the three groups,
      A, B and C that are quantitatively represented using dummy coding.
    
      PTA -prefix age4a                      \
          -input data.txt                   \
          -model 's(age)+s(age,by=AvC)+s(age,by=BvC)'     \
          -prediction pred.txt             
    
      The input table in the file data.txt contains the following structure:
    
      Subj   age grp AvsC BvC
       S1    27   A   1    0
       S2    21   A   1    0
       S3    17   B   0    1
       S4    24   B   0    1
       S5    28   C   0    0
       S6    18   C   0    0
      ...
    
      The column grp above is not necessary for modeling, but it is included to
      be more indicative for the prediction values in the output file
      age4a-prediction.txt
    
      On the other hand, go with the script below when the quantitative variable age
      varies within subject,
    
      PTA -prefix age4b                                   \
          -input data.txt                                \
          -model 's(age)+s(age,by=AvC)+s(age,by=BvC)+s(Subj,bs="re")'  \
          -vt  Subj 's(Subj)'                            \
          -prediction pred.txt
       
    
    Options in alphabetical order:
    ------------------------------
    
       -dbgArgs: This option will enable R to save the parameters in a
             file called .PTA.dbg.AFNI.args in the current directory
              so that debugging can be performed.
    
       -h: this help message
    
       -help: this help message
    
       -input file: input file in a table format (sames as the data frame structure of long format in R. Use the first row to specify the column names. The subject column, if applicable, should not be purely numbers. On the other hand, factors (groups, tasks) should be numerically coded using convenient coding methods such as deviation or dummy coding. 
    
       -interactive: Currently unavailable.
    
       -model FORMULA: Specify the model formulation through multilevel smoothing splines
             expression FORMULA with more than one variable has to be surrounded within
             (single or double) quotes. Variable names in the formula should be
             consistent with the ones used in the header of the input file.
             The nonlinear trajectory is specified through the expression of s(x,k=?)
             where s() indicates a smooth function, x is a quantitative variable with
             which one would like to trace the trajectory and k is the number of smooth
             splines (knots). The default (when k is missing) for k is 10, which is good
             enough most of the time when there are more than 10 data points of x. When
             there are less than 10 data points of x, choose a value of k slightly less
             than the number of data points.
    
       -prediction TABLE: Provide a data table so that predicted values could be generated for
    graphical illustration. Usually the table should contain similar structure as the input
    file except that columns for those varying smoothing terms (e.g., subject) and response
    variable (i.e., Y) should not be included. Try to specify equally-spaced values with a small
    for the quantitative variable of modeled trajectory (e.g., age) so that smooth curves could
    be plotted after the analysis. See Examples in the help for a couple of specific tables used
    for predictions.
    
       -prefix PREFIX: Prefix for output files. 
    
       -show_allowed_options: list of allowed options
    
       -verb VERB: VERB is an integer specifying verbosity level.
                 0 for quiet (Default). 1 or more: talkative.
    
       -vt var formulation: This option is for specifying varying smoothing terms. Two components
             are required: the first one 'var' indicates the variable (e.g., subject) around
             which the smoothing will vary while the second component specifies the smoothing
             formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g.,
             no within-subject variables), do not use this option.
    
       -Y var_name: var_name is used to specify the column name that is designated as
            as the response/outcome variable. The default (when this option is not
            invoked) is 'Y'.