3dREMLfit
AFNIÕs New Approach to Dealing with Serial Correlation in FMRI Linear Regression (GLM)

Conclusions First


Serial correlation does not appreciably impact the activation magnitudes (b s) estimated using 3dDeconvolve (= Ordinary Least Squares solution)
Group activation maps made from combining these b s using 3dANOVA, 3dLME, etc., are essentially the same using 3dDeconvolve or 3dREMLfit (= Generalized Least Squares solution)
	In other words, there is no need to re-run old group analyses to see if allowing for serial correlation will change the results
Thresholded individual subject activation maps are potentially affected, depending on the task timing and on the scanner
	The biggest effect of serial (AKA temporal ) correlation—when this correlation is significant—is on the estimates of the variance of the individual subjectsÕ b s
	If the variance is under-estimated using 3dDeconvolve, then the individual subject t- and F-statistics will be over-estimated
	Individual subject variances and statistics are not usually carried forward to the group analysis level
		Since inter-subject variance is much larger than intra-subject variance
	Thus, group results are only marginally affected by serial correlation

3dDeconvolve and Ordinary Least Squares (OLSQ)


OLSQ = consistent estimator of FMRI time series fit parameter vector b
	No matter what the temporal (AKA serial) correlation structure of the noise
		ÒConsistentÓ means that if you repeated the identical experiment infinitely many times, and averaged the estimated value (e.g., b ; variance), result would be the true value
But OLSQ estimate of time series noise variance is not consistent when serial correlation is present
	OLSQ variance estimator will usually be biased too small with serial correlation
Variance estimate is in denominators of formulas for t- and F-statistics
	Result: individual subject t- and F-values will be too large and/or their DOF parameters will be too large
	Upshot: Significance of individual subject activations will be over-estimated (p-values will be too small)
	Thresholded individual subject FMRI maps might show too much activation
	Obvious impacts on ROIs generated directly from individual subject activation maps (e.g., for connectivity analysis)
	However, statistics taking into account serial correlation can be too conservative, and understate the extent of the ÒtrueÓ regions of activation
		For this reason, and to avoid selection bias, perhaps it is best to define FMRI-derived ROIs using a spherical Òpunch outÓ around each activation map peak

A Tiny Amount of Mathematics


White noise estimate of variance:
	N = number of time points; i = time index
	m = number of fit parameters
	N – m = degrees of freedom (DOF) = how many equal-variance independent random values are left after the time series is fit with m regressors
		OLSQ assumption is that each of the N noise values in the data time series are equal-variance and independent (AKA white noise)
If noise values arenÕt independent, then N – m is too large an estimate of DOF, so variance estimate is too small
Two possible solutions are:
	Adjust variance estimate (and so the t- and F-values) to allow for too few DOF
	Come up with a different variance estimator that has all N – m DOF possible
		Requires estimating the temporal correlation structure of the noise as well
		Once temporal correlation matrix is known, use Generalized Least Squares (GLSQ; AKA pre-whitening) to estimate b parameter vector
		GLSQ is consistent and should produce b-values with smaller variance than OLSQ
Solution #2 is what 3dREMLfit implements

Mathematical Model for Serial Correlation


	My choice: ARMA(1,1) = AutoRegressive order 1 + Moving Average order 1
		Notation: r_k=correlation at time lag #k for k =1, 2, É , N-1
	parameter a = decay rate of the r_k as k increases: for FMRI, 0 £ a < 1
	parameter b = affects correlation at lag 1 (r₁): -1 < b < 1

	For a > 0 and -a < b < 0, ARMA(1,1) noise can be thought of as a sum of AR(1) noise and white noise, with variance proportions determined by b
		Why I prefer 2 parameter ARMA(1,1) over easier 1 parameter AR(1) model (b=0)

New Program: 3dREMLfit


Implements Solution #2: estimate correlation parameters and use GLSQ
	REML is a (partially nonlinear) method for simultaneously estimating variance + correlation parameters and estimating regression fit parameters (b s)
	Each voxel gets a separate estimate of its own correlation parameters (a,b)
		Estimates of a and b can be spatially smoothed before they are used to compute the b s
		Can also input a and b directly and skip their estimation (the slow part), if desired, and use those values to compute the b s
		Variance estimate uses pre-whitened residuals to keep DOF=N – m
	Even if correlation decay parameter a was the same for all voxels, relative amount of white noise (measured by b) mixed in would vary spatially
		Sample analyses using 1-parameter AR(1) and MA(1) models shown later
Inputs to 3dREMLfit
	Run 3dDeconvolve first to setup .xmat.1D matrix file, GLTs, etc.
		DonÕt have to let 3dDeconvolve finish analysis: -x1D_stop
		3dDeconvolve also outputs a command line to run 3dREMLfit with the same 3D+time dataset and the matrix file just created
	Then, input matrix file and 3D+time dataset to 3dREMLfit
Output datasets are structured to be similar to those in 3dDeconvolve
	It should be easy to adapt scripts that use 3dDeconvolve output files (e.g., for group analysis) to use the new software

Rapid Event Related Design (NIH 3 T: JJY)
Individual Maps from 17 Subjects


	Color map & Threshold: Full F such that p=0.001 (Underlay = TT_N27+tlrc)

Block Design (15 s blocks: FBIRN-1 SM Task)
1 Individual Map (Subject#106)

Block Design (30 s blocks: NIH 3T; JJY)
Individual Maps from 16 Subjects


	Color map & Threshold: Full F such that p=0.001 (Underlay = TT_N27+tlrc)

Results Thus Far


Between OLSQ and GLSQ+REML:
	Individual subject thresholded activation maps may differ very little, some, or a lot
Level of temporal correlation determines how much difference GLSQ makes to individual subject statistics
	Amount of temporal correlation seems to depend on magnetic field strength, other scanner details, pulse sequence, É
	Effect of temporal correlation also seems to depend on stimulus timing
	As theory indicates:
		Temporal correlation means noise variance depends on frequency
		So amount of noise that interferes with (Òlooks likeÓ) the signal will depend on frequencies at which the hemodynamic response is appreciable
Next slides: Group activation maps, GLSQ+REML vs OLSQ
	2 cases from NIH: Event-related and Block:30s designs
	DonÕt have enough FBIRN-1 subjects to do a group analysis

Block Design: Group Results (3dANOVA3)

Event-Related Design: Group Results (3dANOVA3)

Tentative Conclusions


For individual subject thresholded activation maps:
	Use GLSQ/REML estimation, especially for slow block design experiments at 3+ Tesla
	Be aware that there may be many false negatives
		i.e., false acceptances of the null hypothesis
		am looking into an FDR-like procedure for estimating the false negative rate, similar to how FDR estimates the false positive rate

For group maps using ANOVA (or similar statistics):
	Differences between OLSQ and GLSQ estimation are small

Recommendations:
	DonÕt need to re-visit group activation conclusions!
	Use 3dREMLfit as a near drop-in replacment for 3dDeconvolve for future work
		A little extra CPU time (usually from 1..3 times as long)

Outline of SPM and FSL Approaches


SPM5 and SPM2
	Estimate fixed ARMA(1,1) (more precisely, AR(1)+white noise) model for all Òvoxels of interestÓ (pass an OLSQ F-test)
		By averaging estimated auto-covariance matrix from OLSQ residuals over these voxels
		SPM assumes AR parameter a È 0.2, and approximates ARMA(1,1) correlations via linear Taylor series, to make correlation parameter estimation easier to program
	Use GLSQ (same for each voxel) to solve for b s
		SPM99: Use OLSQ and adjusts DOF downwards to allow for serial correlation
FSL and FMRIstat (similar, but differ in important details at several points)
	Use OLSQ to get first-pass residuals; use these to estimate each voxelÕs auto-correlation matrix; smooth these matrices spatially (FSL & FMRIstat vary here)
	Estimate AR(1) parameter for each voxel separately from smoothed matrices
	Use GLSQ (different for each voxel) to solve for b s
All these programs use a non-REML method to estimate serial correlation parameter(s) from the OLSQ residual auto-correlation matrix, and then adjust these estimates to reduce the bias thus introduced

Using 3dREMLfit - 1


	Step 1: run 3dDeconvolve as normal, setting up timing, GLTs
	3dDeconvolve ... -bucket Adecon -x1D_stop
	Screen output:
	++ Wrote matrix values to file Adecon.xmat.1D
	++ ========= Things you can do with the matrix file =========
	++ (a) Linear regression with ARMA(1,1) modeling of serial correlation:

	3dREMLfit -matrix Adecon.xmat.1D -input ss17.AllRuns.norm+orig -mask ss17_mask+orig -Rbeta Adecon_beta_REML -fout -Rbuck Adecon_REML -Rvar Adecon_REMLvar

	++ N.B.: 3dREMLfit command above written to file Adecon.REML_cmd
	++ (b) Visualization/analysis of the matrix via ExamineXmat.R
	++ (c) Synthesis of sub-model datasets using 3dSynthesize
	++ ==========================================================
	++ 3dDeconvolve exits: -x1D_stop option was given

Using 3dREMLfit - 2


Step 2: run 3dREMLfit ; perhaps adding options to the command line:
	-addbase : add extra baseline columns to the regression matrix
	-slibase : add extra baseline columns to the regression matrix, on a per slice basis = intended to aid in removal of physiological noise
	-gltsym : add extra GLTs (beyond those from 3dDeconvolve)
	-usetemp : -slibase can require a lot of memory
		Generates REML matrices for many (a,b) cases for each slice
		This option writes & reads temporary matrices to disk to reduce RAM usage
			-verb : outputs information about memory usage as program runs
	-Obuck : output OLSQ bucket dataset (etc.)
		-Rbuck : output GLSQ bucket (stimulus bs and statistics)
		-Rbeta : output GLSQ (all the bs and only the bs; no statistics)
		-Rfitts : output GLSQ fitted model
		-Rvar : output GLSQ (a,b) parameters and variance estimate (per voxel)
	-NEGcor : allow negative correlations in the estimation
		Probably not really needed for FMRI, but option is there just in case
		There are more options to control estimation of the (a,b) parameters
Of course: read the output of 3dREMLfit -help

Potential Add-ons to 3dREMLfit


	Add option to use this program to afni_proc.py super-script
	Add -iresp and -sresp options
	Output variances for bs
		e.g., to be carried to the group analysis level? Need to implement a new approach for this option to be useful.
	Matrix error checking when -addbase or -slibase is used
		In case the bumbling user puts in a collinear column
		Program cannot handle an all-zero column (unlike 3dDeconvolve)
	Re-run with extra GLTs to be added to existing bucket
		Or at least have a GLT-only output option: -Rglt ??
	Finish work with R BirnÕs physiological noise regressors and integrate these into time series analysis via -slibase
	-jobs option to spread load across multiple CPUs
		Especially loop where parameters (a,b) are estimated: the slowest part
	É ???

Serial Correlation Model & Notation: ARMA(1,1)


Denote noise value at time index i by x_i for i=0..N–1
Variance is average (AKA expected) value of noise squared:
	where E [á] means Òexpected value of áÓ
Covariance is similar to variance, measured between different time points:
	which depends on time difference between time points i and j
Correlation is covariance with variance factored out
	(with r₀=1)
		N.B.: r_k measures predictability of noise value at time j+k given value at time j
For entire time series, express variance/correlation as a matrix

Need to have a simplified model for R (i.e., the r_k for k =1, 2, É , N-1)
	Otherwise, have too many parameters to estimate
	My choice: ARMA(1,1) = AutoRegressive order 1 + Moving Average order 1
	parameter a = decay rate of the r_k as k increases: for FMRI, 0 £ a < 1
	parameter b = determines correlation at lag 1 (r₁): -1 < b < 1

	For a > 0 and -a < b < 0, ARMA(1,1) noise can be thought of as a sum of AR(1) noise and white noise, with variance proportions determined by b
		This feature is one reason I prefer ARMA(1,1) as a noise correlation model over AR(1)

AR(1): a vs. MA(1): b vs. ARMA(1,1): a & b


Check the effectiveness of GLSQ pre-whitening solution by examining pre-whitened residuals
	Pre-whitening: applying a linear transformation to the time series data to de-correlate the noise
		Symbolically, R^-^1/2 where R is the correlation matrix
After pre-whitening, residuals (difference between data and fitted time series) should be (mostly) uncorrelated
Power spectrum of white noise is flat
	Power spectrum = expected value of absolute value of Fourier transform, averaged over an infinity of repeated identical experiments
Visually inspect graph of abs[FFT(pre-whitened residuals)]
	Should be flattish, with random excursions
		This is noise, after all, and we donÕt have an infinity of data over which to average
Next 4 slides:
	Graphs of ÒspectrumÓ for OLSQ and GLSQ using ARMA(1,1), AR(1), and MA(1) correlation models (generated using interactive AFNI, of course)
	For 3 strongly ÒactiveÓ voxels in one subject (block design: 30 s blocks; NIH 3T)
	Then the single subject activation maps for 6 types of analysis

Spectrum (slightly smoothed absFFT) of Residuals

Spectrum of Residuals

6 Types of Analysis

Conclusions from Previous Slides


It is possible to find voxels where pre-whitening of different types (AR-only or MA-only or ARMA) is ÒoptimalÓ
	And voxels where pre-whitening makes little difference
For many (most?) voxels, the pre-whitening details donÕt make a lot of difference in the statistics
	As long as something is done that is about right
	e.g., Using fixed AR(1) or MA(1) single parameter method was still OK-ish for single subject maps
		A few more extraneous small blobs
		But fewer than pure OLSQ solution statistics
Map of r₁=correlation at neighboring TRs,
as output by REML and ARMA(1,1) fit
	Same slice as previous slides (NIH 3 T data)
	In general, cortical gray matter shows more
	correlation, but this result is not universal

Mathematics and Implementation


	Available in PDF (scanned from hand-written pages) for the truly devoted
		File 3dREMLfit_mathnotes.pdf
	Outline of REML estimation methodology
		What is REML and why do we care?
	Matrix algebra for efficient solution of the many linear systems that must be solved for each voxel
		Sparse matrix factorizations, multiplications, and solvers
	How ARMA(1,1) parameters are estimated in 3dREMLfit
		Optimizing REML log-likelihood function over a discrete grid of (a,b) values, using 2D binary search
		Must solve a GLSQ problem for each (a,b) tested, for each voxel
	How statistics are implemented as GLTs
		Testing null hypothesis Gb=0 for arbitrary matrix G
	Derivation of ARMA(1,1) formulas
		For completeness, and because we all love equations