:orphan: .. _ahelp_PTA: *** PTA *** .. contents:: :local: | .. code-block:: none ================== Welcome to PTA ================== Program for Profile Tracking Analysis (PTA) #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Version 0.0.3, Mar 30, 2021 Author: Gang Chen (gangchen@mail.nih.gov) Website - https://afni.nimh.nih.gov/gangchen_homepage SSCC/NIMH, National Institutes of Health, Bethesda MD 20892, USA #+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Introduction ------ Profile Tracking Analysis (PTA) estimates nonlinear trajectories or profiles through smoothing splines. Currently the program PTA only works through a command-line scripting mode. Check the examples below: find one close to your specific scenario and use it as a template. The underlying theory is covered in the following paper: Chen et al. (2020). Beyond linearity: Capturing nonlinear relationships in neuroimaging. https://doi.org/10.1101/2020.11.01.363838 To be able to run PTA, one needs to have the R packages "mgcv" installed with the following command at the terminal: rPkgsInstall -pkgs "mgcv" Alternatively you may install them in R: install.packages("mgcv") When a factor (e.g, groups, conditions) is involved, numerical coding is required in formulating the data information. See Examples 3 and 4. The following website provides some explanations regarding factor coding that might be useful for modeling formulation: https://stats.idre.ucla.edu/r/library/r-library-contrast-coding-systems-for-categorical-variables/ There are two output files generated by PTA: one (with the affix -stat.txt) contains the information about the statistical evidence for various effects while the other (with the affix -prediction.txt) tabulates the predicted values and their standard errors which can be utilized to illustrate the inferred trajectories or trends (e.g., using graphical tools such as ggplot2 in R). Example 1 --- simplest case: one group of subjects with a between-subject quantitative variable that does not vary within subject. Analysis is set up to model the trajectory or trend along age: PTA -prefix age \ -input data.txt \ -model 's(age)' \ -Y height \ -prediction pred.txt The function 's(age)' indicates that 'age' is modeled via a smooth curve. No empty space is allowed in the model formulation. The file pred.txt lists all the explanatory variables (excluding lower-level variables such as subject) for prediction. The file should be in a data.frame format as below: age 10 12 14 20 22 24 ... The age step in the above example is 2 years. To obtain smoother graphical appearance in plotted profiles, one can set the age values in pred.txt with a small grid sizer of, for example, 0.5. The file data.txt stores the information for all the variables and input data in a data.frame format as below: Subj age height S1 24 175 S2 14 163 ... The subject labels in the above table can be characters or mixtures of characters and numbers, but they cannot be pure numbers. There will be two output files, one age-stat.txt and the other age-prediction.txt: the former shows the statistical evidence; the latter contains a predicted value for each age plus the associated uncertainty (standard error), which can be plotted using tools such as ggplot2. Example 2 --- Largely same as Example 1, but with 'age' as a within-subject quantitative variable (varying within each subject). The model is now specified by replacing the line of -model in Example 1 with the following two lines: -model 's(age)+s(Subj,bs="re")' \ -vt Subj 's(Subj)' \ The second term 's(Subj,bs="re")' in the model specification means that each subject is allowed to have a varying intercept or random effect ('re'). To estimate the smooth trajectory through the option -prediction, the option -vt has to be included in this case to indicate the varying term (usually subjects). That is, if prediction is desirable, one has to explicitly declare the variable (e.g., Subj) that is associated with the varying term (e.g., s(Subj)). No empty space is allowed in the model formulation and the the varying term. The full script version is PTA -prefix age2 \ -input data.txt \ -model 's(age)+s(Subj,bs="re")' \ -vt Subj 's(Subj)' \ -prediction pred.txt All the rest remains the same as Example 1. Example 3 --- two groups and one quantitative variable (age). The analysis is set up to compare the trajectory or trend along age between the two groups, which are quantitatively coded as -1 and 1. For example, if the two groups are females and males, you can code females as -1 and males as 1. The following script applies to the situation when the quantitative variable age does not vary within subject, PTA -prefix age3a \ -input data.txt \ -model 's(age)+s(age,by=MvF)' \ -prediction pred.txt The prediction table in the file data.txt contains the following structure: Subj age grp MvsF S1 27 M 1 S2 21 M 1 S3 28 F -1 S4 18 F -1 ... The column grp above is not necessary for modeling, but it is included to be more indicative for the prediction values in the output file age3a-prediction.txt Similarly, the prediction file pred.txt looks like (set the age values with a small grid so that the graphical illustration would be smooth): age grp MvsF 10 M 1 12 M 1 ... 28 M 1 30 M 1 10 F -1 12 F -1 ... 28 F -1 30 F -1 Note that the age values for prediction have a gap of 2 years: The a smaller the gap, the smoother the plotted predictions. On the other hand, go with the script below when the quantitative variable age varies within subject, PTA -prefix age3b \ -input data.txt \ -model 's(age)+s(age,by=grp)+s(Subj,bs="re")' \ -vt Subj 's(Subj)' \ -prediction pred.txt Example 4 --- This example demonstrates the situations where more than two levels are involved in a between- or within-subject factor. Suppose that three groups and one quantitative variable (age). The analysis is set up to compare the trajectory or trend along age between the three groups, A, B and C that are quantitatively represented using dummy coding. PTA -prefix age4a \ -input data.txt \ -model 's(age)+s(age,by=AvC)+s(age,by=BvC)' \ -prediction pred.txt The input table in the file data.txt contains the following structure: Subj age grp AvsC BvC S1 27 A 1 0 S2 21 A 1 0 S3 17 B 0 1 S4 24 B 0 1 S5 28 C 0 0 S6 18 C 0 0 ... The column grp above is not necessary for modeling, but it is included to be more indicative for the prediction values in the output file age4a-prediction.txt On the other hand, go with the script below when the quantitative variable age varies within subject, PTA -prefix age4b \ -input data.txt \ -model 's(age)+s(age,by=AvC)+s(age,by=BvC)+s(Subj,bs="re")' \ -vt Subj 's(Subj)' \ -prediction pred.txt Options in alphabetical order: ------------------------------ -dbgArgs: This option will enable R to save the parameters in a file called .PTA.dbg.AFNI.args in the current directory so that debugging can be performed. -h: this help message -help: this help message -input file: input file in a table format (sames as the data frame structure of long format in R. Use the first row to specify the column names. The subject column, if applicable, should not be purely numbers. On the other hand, factors (groups, tasks) should be numerically coded using convenient coding methods such as deviation or dummy coding. -interactive: Currently unavailable. -model FORMULA: Specify the model formulation through multilevel smoothing splines expression FORMULA with more than one variable has to be surrounded within (single or double) quotes. Variable names in the formula should be consistent with the ones used in the header of the input file. The nonlinear trajectory is specified through the expression of s(x,k=?) where s() indicates a smooth function, x is a quantitative variable with which one would like to trace the trajectory and k is the number of smooth splines (knots). The default (when k is missing) for k is 10, which is good enough most of the time when there are more than 10 data points of x. When there are less than 10 data points of x, choose a value of k slightly less than the number of data points. -prediction TABLE: Provide a data table so that predicted values could be generated for graphical illustration. Usually the table should contain similar structure as the input file except that columns for those varying smoothing terms (e.g., subject) and response variable (i.e., Y) should not be included. Try to specify equally-spaced values with a small for the quantitative variable of modeled trajectory (e.g., age) so that smooth curves could be plotted after the analysis. See Examples in the help for a couple of specific tables used for predictions. -prefix PREFIX: Prefix for output files. -show_allowed_options: list of allowed options -verb VERB: VERB is an integer specifying verbosity level. 0 for quiet (Default). 1 or more: talkative. -vt var formulation: This option is for specifying varying smoothing terms. Two components are required: the first one 'var' indicates the variable (e.g., subject) around which the smoothing will vary while the second component specifies the smoothing formulation (e.g., s(age,subject)). When there is no varying smoothing terms (e.g., no within-subject variables), do not use this option. -Y var_name: var_name is used to specify the column name that is designated as as the response/outcome variable. The default (when this option is not invoked) is 'Y'.