|
|
|
|
|
Serial correlation does not appreciably impact the
activation magnitudes (b s) estimated using 3dDeconvolve
(= Ordinary Least Squares solution) |
|
Group activation maps made from combining these b s using 3dANOVA, 3dLME, etc., are essentially the same
using 3dDeconvolve or 3dREMLfit (= Generalized Least Squares solution) |
|
In other words, there is no need to re-run old group
analyses to see if allowing for serial correlation will change the results |
|
Thresholded individual subject activation maps are
potentially affected, depending on the task timing and on the scanner |
|
The biggest effect of serial (AKA temporal )
correlation—when this correlation is significant—is on the
estimates of the variance of the individual subjectsÕ b s |
|
If the variance is under-estimated using 3dDeconvolve, then
the individual subject t- and F-statistics will be over-estimated |
|
Individual subject variances and statistics are not usually
carried forward to the group analysis level |
|
Since inter-subject variance is much larger than
intra-subject variance |
|
Thus, group results are only marginally affected by serial
correlation |
|
|
|
|
|
OLSQ = consistent estimator of FMRI
time series fit parameter vector b |
|
No matter what the temporal (AKA serial) correlation
structure of the noise |
|
ÒConsistentÓ means that if you repeated
the identical experiment infinitely many times, and averaged the estimated
value (e.g., b ; variance),
result would be the true value |
|
But OLSQ estimate of time series noise variance
is not consistent when serial correlation is present |
|
OLSQ variance estimator will usually be
biased too small with serial correlation |
|
Variance estimate is in denominators of
formulas for t- and F-statistics |
|
Result: individual subject t- and F-values
will be too large and/or their DOF parameters will be too large |
|
Upshot: Significance of individual
subject activations will be over-estimated (p-values will be too small) |
|
Thresholded individual subject FMRI
maps might show too much activation |
|
Obvious impacts on ROIs generated
directly from individual subject activation maps (e.g., for connectivity
analysis) |
|
However, statistics taking into account
serial correlation can be too conservative, and understate the extent of the
ÒtrueÓ regions of activation |
|
For this reason, and to avoid selection bias, perhaps it is
best to define FMRI-derived ROIs using a spherical Òpunch outÓ around each
activation map peak |
|
|
|
|
|
White noise estimate of variance: |
|
N = number of time points; i = time index |
|
m = number of fit parameters |
|
N – m = degrees of freedom (DOF) = how many
equal-variance independent random values are left after the time series is
fit with m regressors |
|
OLSQ assumption is that each of the N
noise values in the data time series are equal-variance and independent (AKA white
noise) |
|
If noise values arenÕt independent, then N – m is too
large an estimate of DOF, so variance estimate is too small |
|
Two possible solutions are: |
|
Adjust variance estimate (and so the t-
and F-values) to allow for too few DOF |
|
Come up with a different variance
estimator that has all N – m DOF possible |
|
Requires estimating the temporal
correlation structure of the noise as well |
|
Once temporal correlation matrix is
known, use Generalized Least Squares (GLSQ; AKA pre-whitening) to estimate b parameter vector |
|
GLSQ is consistent and should produce b-values with smaller variance than OLSQ |
|
Solution #2 is what 3dREMLfit implements |
|
|
|
|
My choice: ARMA(1,1) = AutoRegressive
order 1 + Moving Average order 1 |
|
Notation: rk = correlation
at time lag #k for k =1, 2, É , N-1 |
|
parameter a = decay rate of the rk as k
increases: for FMRI,
0 £ a < 1 |
|
parameter b = affects correlation at lag 1 (r1):
-1 < b < 1 |
|
|
|
For a > 0 and -a < b < 0,
ARMA(1,1) noise can be thought of as a sum of AR(1) noise and white noise,
with variance proportions determined by b |
|
Why I prefer 2 parameter ARMA(1,1) over easier 1 parameter
AR(1) model (b=0) |
|
|
|
|
|
Implements Solution #2: estimate correlation parameters and
use GLSQ |
|
REML is a (partially nonlinear) method for simultaneously
estimating variance + correlation parameters and estimating regression fit
parameters (b s) |
|
Each voxel gets a separate estimate of its own correlation
parameters (a,b) |
|
Estimates of a and b can be spatially
smoothed before they are used to compute the b s |
|
Can also input a and b directly and
skip their estimation (the slow part), if desired, and use those values to compute
the b s |
|
Variance estimate uses pre-whitened
residuals to keep DOF=N – m |
|
Even if correlation decay parameter a was the same for all
voxels, relative amount of white noise (measured by b) mixed in would vary
spatially |
|
Sample analyses using 1-parameter AR(1) and MA(1) models
shown later |
|
Inputs to 3dREMLfit |
|
Run 3dDeconvolve first to setup .xmat.1D matrix file, GLTs,
etc. |
|
DonÕt have to let 3dDeconvolve finish analysis: -x1D_stop |
|
3dDeconvolve also outputs a command line to run 3dREMLfit
with the same 3D+time dataset and the matrix file just created |
|
Then, input matrix file and 3D+time dataset to 3dREMLfit |
|
Output datasets are structured to be similar to those in 3dDeconvolve |
|
It should be easy to adapt scripts that use 3dDeconvolve output
files (e.g., for group analysis) to use the new software |
|
|
|
|
|
Between OLSQ and GLSQ+REML: |
|
Individual subject thresholded activation maps may differ
very little, some, or a lot |
|
Level of temporal correlation determines how much
difference GLSQ makes to individual subject statistics |
|
Amount of temporal correlation seems to depend on magnetic
field strength, other scanner details, pulse sequence, É |
|
Effect of temporal correlation also seems to depend on
stimulus timing |
|
As theory indicates: |
|
Temporal correlation means noise variance depends on
frequency |
|
So amount of noise that interferes with (Òlooks likeÓ) the
signal will depend on frequencies at which the hemodynamic response is
appreciable |
|
Next slides: Group activation maps, GLSQ+REML vs OLSQ |
|
2 cases from NIH: Event-related and Block:30s designs |
|
DonÕt have enough FBIRN-1 subjects to do a group analysis |
|
|
|
|
|
For individual subject thresholded activation maps: |
|
Use GLSQ/REML estimation, especially for slow block design
experiments at 3+ Tesla |
|
Be aware that there may be many false negatives |
|
i.e., false acceptances of the null hypothesis |
|
am looking into an FDR-like procedure for estimating the
false negative rate, similar to how FDR estimates the false positive rate |
|
|
|
For group maps using ANOVA (or similar statistics): |
|
Differences between OLSQ and GLSQ estimation are small |
|
|
|
Recommendations: |
|
DonÕt need to re-visit group activation conclusions! |
|
Use 3dREMLfit as a near drop-in replacment for 3dDeconvolve
for future work |
|
A little extra CPU time (usually from
1..3 times as long) |
|
|
|
|
|
SPM5 and SPM2 |
|
Estimate fixed ARMA(1,1) (more precisely, AR(1)+white
noise) model for all Òvoxels of interestÓ (pass an OLSQ F-test) |
|
By averaging estimated auto-covariance
matrix from OLSQ residuals over these voxels |
|
SPM assumes AR parameter a È 0.2, and approximates ARMA(1,1) correlations via linear
Taylor series, to make correlation parameter estimation easier to program |
|
Use GLSQ (same for each voxel) to solve for b s |
|
SPM99: Use OLSQ and adjusts DOF
downwards to allow for serial correlation |
|
FSL and FMRIstat (similar, but differ in important details
at several points) |
|
Use OLSQ to get first-pass residuals;
use these to estimate each voxelÕs auto-correlation matrix; smooth these
matrices spatially (FSL & FMRIstat vary here) |
|
Estimate AR(1) parameter for each voxel
separately from smoothed matrices |
|
Use GLSQ (different for each voxel) to
solve for b s |
|
All these programs use a non-REML method to estimate serial
correlation parameter(s) from the OLSQ residual auto-correlation matrix, and
then adjust these estimates to reduce the bias thus introduced |
|
|
|
Step 1: run 3dDeconvolve as normal, setting up timing, GLTs |
|
3dDeconvolve ... -bucket Adecon -x1D_stop |
|
Screen output: |
|
++ Wrote matrix values to file
Adecon.xmat.1D |
|
++ ========= Things you can do with the
matrix file ========= |
|
++ (a) Linear regression with ARMA(1,1)
modeling of serial correlation: |
|
|
|
3dREMLfit -matrix Adecon.xmat.1D -input
ss17.AllRuns.norm+orig -mask
ss17_mask+orig -Rbeta Adecon_beta_REML -fout -Rbuck Adecon_REML -Rvar Adecon_REMLvar |
|
|
|
++ N.B.: 3dREMLfit command above
written to file Adecon.REML_cmd |
|
++ (b) Visualization/analysis of the
matrix via ExamineXmat.R |
|
++ (c) Synthesis of sub-model datasets
using 3dSynthesize |
|
++
========================================================== |
|
++ 3dDeconvolve exits: -x1D_stop option
was given |
|
|
|
|
|
|
Step 2: run 3dREMLfit ; perhaps adding options to the
command line: |
|
-addbase : add extra baseline columns to the regression
matrix |
|
-slibase : add extra baseline columns to the regression
matrix, on a per slice basis = intended to aid in removal of physiological
noise |
|
-gltsym : add extra GLTs (beyond those
from 3dDeconvolve) |
|
-usetemp : -slibase can require a lot of memory |
|
Generates REML matrices for many (a,b) cases for each slice |
|
This option writes & reads temporary matrices to disk
to reduce RAM usage |
|
-verb : outputs information about memory usage as program
runs |
|
-Obuck : output OLSQ bucket dataset (etc.) |
|
-Rbuck :
output GLSQ bucket (stimulus bs and statistics) |
|
-Rbeta :
output GLSQ (all the bs and only the bs; no statistics) |
|
-Rfitts : output GLSQ fitted model |
|
-Rvar : output GLSQ (a,b) parameters and variance
estimate (per voxel) |
|
-NEGcor : allow negative correlations in the estimation |
|
Probably not really needed for FMRI,
but option is there just in case |
|
There are more options to control
estimation of the (a,b) parameters |
|
Of course: read the output of 3dREMLfit -help |
|
|
|
|
Add option to use this program to afni_proc.py super-script |
|
Add -iresp and -sresp options |
|
Output variances for bs |
|
e.g., to be carried to the group
analysis level? Need to
implement a new approach for this option to be useful. |
|
Matrix error checking when -addbase or -slibase is used |
|
In case the bumbling user puts in a collinear column |
|
Program cannot handle an all-zero column (unlike 3dDeconvolve) |
|
Re-run with extra GLTs to be added to existing bucket |
|
Or at least have a GLT-only output option: -Rglt ?? |
|
Finish work with R BirnÕs physiological noise regressors
and integrate these into time series analysis via -slibase |
|
-jobs option to spread load across multiple CPUs |
|
Especially loop where parameters (a,b) are estimated: the
slowest part |
|
É ??? |
|
|
|
|
|
|
|
Denote noise value at time index i by xi for i=0..N–1 |
|
Variance is average (AKA expected) value of noise squared: |
|
where E [á] means Òexpected value of áÓ |
|
Covariance is similar to variance, measured between
different time points: |
|
which depends on time difference between time points i and j |
|
Correlation is covariance with variance factored out |
|
(with r0=1) |
|
N.B.: rk measures
predictability of noise value at time j+k given value at time j |
|
For entire time series, express variance/correlation as a
matrix |
|
|
|
Need to have a simplified model for R (i.e., the rk
for k =1, 2, É , N-1) |
|
Otherwise, have too many parameters to estimate |
|
My choice: ARMA(1,1) = AutoRegressive order 1 + Moving Average
order 1 |
|
parameter a = decay rate of the rk as k
increases: for FMRI,
0 £ a < 1 |
|
parameter b = determines correlation at lag 1 (r1):
-1 < b < 1 |
|
|
|
For a > 0 and -a < b < 0,
ARMA(1,1) noise can be thought of as a sum of AR(1) noise and white noise,
with variance proportions determined by b |
|
This feature is one reason I prefer
ARMA(1,1) as a noise correlation model over AR(1) |
|
|
|
|
|
Check the effectiveness of GLSQ pre-whitening solution by
examining pre-whitened residuals |
|
Pre-whitening: applying a linear transformation to the time
series data to de-correlate the noise |
|
Symbolically, R-1/2 where R is the correlation matrix |
|
After pre-whitening, residuals (difference between data and
fitted time series) should be (mostly) uncorrelated |
|
Power spectrum of white noise is flat |
|
Power spectrum = expected value of absolute value of
Fourier transform, averaged over an infinity of repeated identical
experiments |
|
Visually inspect graph of abs[FFT(pre-whitened residuals)] |
|
Should be flattish, with random excursions |
|
This is noise, after all, and we donÕt
have an infinity of data over which to average |
|
Next 4 slides: |
|
Graphs of ÒspectrumÓ for OLSQ and GLSQ using ARMA(1,1),
AR(1), and MA(1) correlation models (generated using interactive AFNI, of
course) |
|
For 3 strongly ÒactiveÓ voxels in one subject (block
design: 30 s blocks; NIH 3T) |
|
Then the single subject activation maps for 6 types of
analysis |
|
|
|
|
|
It is possible to find voxels where pre-whitening of
different types (AR-only or MA-only or ARMA) is ÒoptimalÓ |
|
And voxels where pre-whitening makes little difference |
|
For many (most?) voxels, the pre-whitening details donÕt
make a lot of difference in the statistics |
|
As long as something is done that is about right |
|
e.g., Using fixed AR(1) or MA(1) single parameter method
was still OK-ish for single subject maps |
|
A few more extraneous small blobs |
|
But fewer than pure OLSQ solution
statistics |
|
Map of r1=correlation at neighboring TRs, |
|
as output by REML and ARMA(1,1) fit |
|
Same slice as previous slides (NIH 3 T data) |
|
In general, cortical gray matter shows more |
|
correlation, but this result is not universal |
|
|
|
|
|
|
|
|
Available in PDF (scanned from hand-written pages) for the
truly devoted |
|
File 3dREMLfit_mathnotes.pdf |
|
Outline of REML estimation methodology |
|
What is REML and why do we care? |
|
Matrix algebra for efficient solution of the many linear
systems that must be solved for each voxel |
|
Sparse matrix factorizations, multiplications, and solvers |
|
How ARMA(1,1) parameters are estimated in 3dREMLfit |
|
Optimizing REML log-likelihood function over a discrete
grid of (a,b) values, using 2D binary search |
|
Must solve a GLSQ problem for each (a,b) tested, for each
voxel |
|
How statistics are implemented as GLTs |
|
Testing null hypothesis Gb=0 for arbitrary
matrix G |
|
Derivation of ARMA(1,1) formulas |
|
For completeness, and because we all love equations |