Multivariate Auto-Regressive Modeling

 


Modeling strategy: vector auto-regressive analysis 

 

Unlike the pure data-driven approach adopted in 3dGC.R in which the analysis focuses only on the connection of a target voxel with a seed region in a bivariate fashion similar to simple correlation or context-dependent correlation (aka PPI) analysis, we take the approach of vector (or multivariate) auto-regressive (VAR or MAR) analysis, similar to structural equation modeling (SEM), by considering a number of pre-selected regions. But unlike SEM, the primary causality analysis is done with all the regions' time series at individual subject level. 

The modeling approach is a mixture. It is model-based in the sense that we start with a number of pre-selected regions and the between-regions relationship is assumed to be linear. But the analysis is also exploratory in the sense all the path connectivities statistically tested in a data-driven fashion. The pitfall of this approach is that any missing regions that should be in the network but aren't included in the model could ruin the whole analysis and end up with spurious connectivities.

For more discussion, see G. Chen, et al., Vector autoregression, structural equation modeling, and their synthesis in neuroimaging data analysis, Comput. Biol. Med. (2011), doi:10.1016/j.compbiomed.2011.09.004

 

Program 1dGC

1dGC, written in R, can be run on all major platforms such as unix-based systems and Windows, and requires R installation: 

Choose a mirror site geographically close to you, and download the appropriate binary for your platform (or the source code and then compile yourself). Set your path appropriately. For example, my R executable is under /Applications/R.app/Contents/MacOS on my Mac OS X, so I add /Applications/R.app/Contents/MacOS as one of the search paths in my C shell startup configuration file .cshrc

If the installation is successful, start the R interface with the following command on the prompt

R

You can also work with the GUI version of R on Mac OS and Windows.

1dGC.R should be already in the most recent version of AFNI package (say, under ~/abin/). You can also download it from here, and place it wherever you prefer.

If you know the directory (e.g., ~/abin/) 1dGC.R is in, launch it inside R by typing/copying, for example, 

source("~/abin/1dGC.R")

If you are not sure about the location of 1dGC.R, copy the following into R (assuming 1dGC.R is under the same directory as the AFNI graphic viewer):

source(paste(strsplit(system('which afni', intern=TRUE), "afni")[[1]], "1dGC.R", sep=""))

or (assuming 1dGC.R is on one of the search paths)

source(Find(file.exists, file.path(strsplit(system("echo $PATH", intern=TRUE), "\\:")[[1]], "1dGC.R")))

 

1dGC.R works in a procedural or streamlined fashion with a string of inputs about parameters and options, and can run analysis at both individual subject and group levels. Hopefully anything else should be self-evident from there.

In case the program chokes because of failure on installing packages such as vars, network, tcltk, etc. for some reason, run the following commands in R:

install.packages("vars",dependencies=TRUE)
install.packages("network",dependencies=TRUE)
install.packages("gsl",dependencies=TRUE)
install.packages("car",dependencies=TRUE)

To quit R, type

q()

(or hit letter "d" while holding down CTRL key on UNIX-based systems).

 

Input data

Some suggestions about input files below might be too specific for FMRI data. Make proper adjustment if the program is used under other circumstances.

(1) All input data are assumed having a suffix of .1D in the AFNI convention, and they are typically time series at regions of interest (ROIs) and from covariates such as conditions/tasks of no interest, head motion and physiological noises). They should be structured in a pure text format of either multiple one-column files or one multi-column file.

(2) Header is NOT allowed in a one-column input file, but is optional for a multi-column input file: If provided in the first row, it has to be the labels of those ROIs/nodes, as the format of data frame in R. ROI time series as input are required, but covariates are optional. All ROI time series can be stored in one data frame, or multiple one-column files, but not a mixture. 

(3) The minimum pre-processing steps for the ROI time series include slice timing correction and volume registration. Spatial smoothing is typically recommended for noise reduction, but not mandatory. It is the change relative to the baseline that is comparable across blocks/runs, regions, and subjects, therefore signal normalization through scaling in terms of the loose concept of percent signal change is very important, and it can be done during the pre-processing, or you can leave it for 1dGC.R to handle.

(4) Any confounding effects are better entered as covariates in the causality model unless they are orthogonal to the autocorrelation present in the network, which is rarely true.

(5) Since the low frequencies (drifting effect including baseline) in the signal can be modeled with polynomials embedded in the causality model, it's NOT recommended to remove the trend (including baseline) during the pre-processing because of argument (4) above. You don't have to include the polynomial time series from the design matrix as covariates for input, but if you do decide to include them, disable the embedded option in the program by specifying an order of -1 for polynomials in the program.

(6) Covariates should be in separate file(s) from ROI time series files, and can be stripped from the design matrix of the individual subject regression analysis. All the ROI time series can be multiple one-column files or one multi-column file, but not a mixture of both formats. The same is true for covariates, but they don't have to be of the same format as the ROI files. All time series from ROI and covariates must have the same length and match up in temporal sequence. They can be data from multiple blocks and/or runs stitched together.

(7) If you want to censor out a few time points in the time series, create one covariate for each censored time point with the same length of the time series and with all 0's except putting a 1 at the censored time point.

(8) Input files for group analysis are supposed to be path coefficients (plus corresponding t-statistics) saved from analysis at individual subject level.

 

Features of 1dGC

Compared to other generic Granger analysis tools, 1dGC is used in time domain with the following features:

(1) written in an open source language and executable on all computer platforms;

(2) allowing breaks in the data;

(3) extending VAR model with all possible covariates included as part of the analysis instead of being regressed out prior to the analysis, and fine-tuning the model by removing covariates deemed insignificant;  

(4) providing 4 criteria for VAR order selection: AIC, FPE, HQ, and SC;

(5) outputting one network per lag (based on path coefficients and their t-statistics) instead of lumping all lags into one network (based on overall F-statistics across lags);

(6) diagnosing the model from various perspectives: order selection criteria, stationarity (testing roots of characteristic polynomial), residual normality tests (Gaussian process, skewness, and kurtosis), residual autocorrelation (portmanteau, Breusch-Godfrey, and Edgerton-Shukur tests), autoregressive conditional heteroskedasticity (ARCH) test for time-varying volatility, and structure stability (or stationarity detection);

(7) running group analysis on path coefficients and t-statistics per lag from individual subjects instead of overall F values across all lags.

 

Useful links

1. Vector auto-regressive modeling

 

Acknowledgements

I'd like to thank Patrick Brandt and Chris Sims for theoretical consultation, Bernhard Pfaff for programming support, Jim Bjork, Paul Hamilton, Jonathan Omuircheartaigh, Kai Hwang, and Jeremy I. Skipper for help in testing the program and for providing feedback.