Usage: 3dClustSim [options]
Program to estimate the probability of false positive (noise-only) clusters. An adaptation of Doug Ward’s AlphaSim, streamlined for various purposes.
In particular, this program lets you run with multiple p-value thresholds (the ‘-pthr’ option) and only outputs the cluster size threshold at chosen values of the alpha significance level (the ‘-athr’ option).
In addition, the program allows the output to be formatted for inclusion into an AFNI dataset’s header, whence it can be used in the AFNI Clusterize interface to show approximate alpha values for the displayed clusters, where the per-voxel p-value is taken from the interactive threshold slider in the AFNI ‘Define Overlay’ control panel, and then the per-cluster alpha value is interpolated in this table from 3dClustSim. As you change the threshold slider, the per-voxel p-value (shown below the slider) changes, and then the interpolated alpha values are updated.
********* IMPORTANT NOTE [Dec 2015] *********************************** A completely new method for estimating and using noise smoothness values is now available in 3dFWHMx and 3dClustSim. This method is implemented in the ‘-acf’ options to both programs. ‘ACF’ stands for (spatial) AutoCorrelation Function, and it is estimated by calculating moments of differences out to a larger radius than before.
Notably, real FMRI data does not actually have a Gaussian-shaped ACF, so the estimated ACF is then fit (in 3dFWHMx) to a mixed model (Gaussian plus mono-exponential) of the form
ACF(r) = a * exp(-r*r/(2*b*b)) + (1-a)*exp(-r/c)
where ‘r’ is the radius, and ‘a’, ‘b’, ‘c’ are the fitted parameters. The apparent FWHM from this model is usually somewhat larger in real data than the FWHM estimated from just the nearest-neighbor differences used in the ‘classic’ analysis.
The longer tails provided by the mono-exponential are also significant. 3dClustSim has also been modified to use the ACF model given above to generate noise random fields.
—————————————————————————- ** The take-away (TL;DR or summary) message is that the ‘classic’ 3dFWHMx and ** ** 3dClustSim analysis, using a pure Gaussian ACF, is not very correct for ** ** FMRI data – I cannot speak for PET or MEG data. ** —————————————————————————-
** —————————————————————————** ** IMPORTANT CHANGES – February 2015 ************************************** ** —————————————————————————** ** In the past, 3dClustSim did ‘1-sided’ testing; that is, the random dataset ** of Gaussian noise-only values is generated, and then it is thresholded on ** the positive side so that the N(0,1) upper tail probability is pthr. ^^ ** NOW, 3dClustSim does 3 different types of thresholding: ** 1-sided: as above ** 2-sided: where positive and negative values above the threshold ** are included, and then clustered together ** (in this case, the threshold on the Gaussian values is) ** (fixed so that the 1-sided tail probability is pthr/2.) ** bi-sided: where positive values and negative values above the ** threshold are clustered SEPARATELY (with the 2-sided threshold) ** For high levels of smoothness, the results from bi-sided and 2-sided are ** very similar – since for smooth data, it is unlikely that large clusters of ** positive and negative values will be next to each other. With high smoothness, ** it is also true that the 2-sided results for 2*pthr will be similar to the ** 1-sided results for pthr, for the same reason. Since 3dClustSim is meant to be ** useful when the noise is NOT very smooth, we provide tables for all 3 cases. ^^ ** In particular, note that when the AFNI GUI threshold is set to a t-statistic, ** 2-sided testing is what is usually appropriate – in that case, the cluster ** size thresholds tend to be smaller than the 1-sided case, which means that ** more clusters tend to be significant than in the past. ^^ ** In addition, the 3 different NN approaches (NN=1, NN=2, NN=3) are ALL ** always computed now. That is, 9 different tables are produced, each ** of which has its proper place when combined with the AFNI Clusterize GUI. ** The 3 different NN methods are: ** 1 = Use first-nearest neighbor clustering ** * above threshold voxels cluster together if faces touch ** 2 = Use second-nearest neighbor clustering ** * voxels cluster together if faces OR edges touch ** 3 = Use third-nearest neighbor clustering ** * voxels cluster together if faces OR edges OR corners touch ** The clustering method only makes a difference at higher (less significant) ** values of pthr. At small values of pthr (more significant), all three ** clustering methods will give very similar results. ^^ **** PLEASE NOTE that the NIML outputs from this new version are not named the **** same as those from the older version. Thus, any script that takes the NIML **** format tables and inserts them into an AFNI dataset header must be modified **** to match the new names. The 3drefit command fragment output at the end of **** this program (and echoed into file ‘3dClustSim.cmd’) shows the new form **** of the names involved. **** ————————————————————————-** **** SMOOTHING CHANGE – May 2015 ****************************************** ** —————————————————————————** ** It was pointed out to me (by Anders Eklund and Tom Nichols) that smoothing ** the simulated data over a finite volume introduces 2 artifacts, which might ** be called ‘edge effects’. To minimize these problems, this program now makes ** extra-large (padded) simulated volumes before blurring, and then trims those ** back down to the desired size, before continuing with the thresholding and ** cluster-counting steps. To run 3dClustSim without this padding added, use ** the new ‘-nopad’ option. **** ————————————————————————-**
*** Specify the volume over which the simulation will occur ***
—–** (a) Directly give the spatial domain that will be used **—–
at the center of the grid and touching the edges; this will keep about 1/2 the points in the 3D grid. [default = use all voxels in the 3D grid]
—–** OR: (b) Specify the spatial domain using a dataset mask **—–
128 or more nonzero voxels. However, IF you know what you are doing, and IF you are willing to live life on the edge of statistical catastrophe, then you can use this option to allow smaller masks – in a sense, this is the ‘consent form’ for such strange shenanigans.
- If you use this option, it must come BEFORE ‘-mask’.
- Also read the ‘CAUTION and CAVEAT’ section, far below.
** ‘-mask’ means that ‘-nxyz’ & ‘-dxyz’ & ‘-BALL’ will be ignored. **
—** the remaining options control how the simulation is done **—
[default = 0.0 = no smoothing]
-fwhmxyz sx sy szto specify the three values separately.
autocorrelation function parameters output by 3dFWHMx to do non-Gaussian (long-tailed) filtering. * Using ‘-acf’ will make ‘-fwhm’ pointless! * The ‘a’ parameter must be between 0 and 1. * The ‘b’ and ‘c’ parameters (scale radii) must be positive. * The spatial autocorrelation function is given by
ACF(r) = a * exp(-r*r/(2*b*b)) + (1-a)*exp(-r/c)
each face to allow for edge effects of the smoothing process. If you want to turn this feature off, use the ‘-nopad’ option.
the simulation will print out the cluster size thresholds. For each ‘p’ and ‘a’, the smallest cluster size C(p,a) for which the probability of the ‘p’-thresholded image having a noise-only cluster of size C is less than ‘a’ is the output (cf. the sample output, below) [default = 0.10 0.05 0.02 0.01]
** Both lists ‘-pthr’ and ‘-athr’ (of values between 0 and 0.2) ** ** should be given in DESCENDING order. They will be sorted to be ** ** that way in any case, and such is how the output will be given. **
** The list of values following ‘-pthr’ or ‘-athr’ can be replaced ** ** with the single word ‘LOTS’, which will tell the program to use ** ** a longer list of values for these probabilities [try it & see!] ** ** (i.e., ‘-pthr LOTS’ and/or ‘-athr LOTS’ are legal options) **
-LOTS = the same as using ‘-pthr LOTS -athr LOTS’
-iter n = number of Monte Carlo simulations [default = 10000]
for k=1, 2, 3, and for X=1sided, 2sided, bisided. * If ‘-prefix is not used, results go to standard output. * If ‘-niml’ is used, the filename is ‘ppp.NNk_Xsided.niml’.
- To be clear, the 9 files that will be named
- ppp.NN1_1sided.niml ppp.NN1_2sided.niml ppp.NN1_bisided.niml ppp.NN2_1sided.niml ppp.NN2_2sided.niml ppp.NN2_bisided.niml ppp.NN3_1sided.niml ppp.NN3_2sided.niml ppp.NN3_bisided.niml
- -ssave:TYPE ssprefix = Save the un-thresholded generated random volumes into
- datasets (‘-iter’ of them). Here, ‘TYPE’ is one of these:
- blurred == save the blurred 3D volume before masking
- masked == save the blurred volume after masking
The output datasets will actually get prefixes generated with the string ‘ssprefix’ being appended by a 6 digit integer (the iteration index), starting at 000000. (You can use SOMETHING.nii as a prefix; it will work OK.) N.B.: This option will slow the program down a lot,
and is intended to help just one specific user.
- ++ One reason for 3dClustSim to be used in place of AlphaSim is that it will
- be faster than running AlphaSim multiple times.
- ++ Another reason is that the resulting table can be stored in an AFNI
- dataset’s header, and used in the AFNI Clusterize GUI to see estimated cluster significance (alpha) levels.
- ++ So if your cluster is larger than the C(p,0.01) threshold in size (say),
- then it is very unlikely that noise BY ITSELF produced this result.
- ++ This statement does not mean that ALL the voxels in the cluster are
- ‘truly’ active – it means that at least SOME of them are (very probably) active. The statement of low probability (0.01 in this example) of a false positive result applies to the cluster as a whole, not to each voxel within the cluster.
To add the cluster simulation C(p,alpha) table to the header of an AFNI dataset, something like the following can be done [tcsh syntax]:
set fwhm = ( 3dFWHMx -combine -detrend time_series_dataset+orig ) 3dClustSim -mask mask+orig -fwhm $fwhm[4] -niml -prefix CStemp 3drefit -atrstring AFNI_CLUSTSIM_NN1_1sided file:CStemp.NN1_1sided.niml
-atrstring AFNI_CLUSTSIM_MASK file:CStemp.mask statistics_dataset+orig
rm -f CStemp.*
AFNI’s Clusterize GUI makes use of these attributes, if stored in a statistics dataset (e.g., something from 3dDeconvolve, 3dREMLfit, etc.).
** Nota Bene: afni_proc.py will automatically run 3dClustSim, and **
* put the results into the statistical results dataset for you. *
- ++ To be clear, the per-voxel p-value is taken from the AFNI GUI threshold
- slider (the p-value is shown beneath the slider), and then the C(p,alpha) table is inverse-interpolated to find the per-cluster alpha value for each different cluster size.
- ++ As you move the AFNI threshold slider, the per-voxel (uncorrected for
- multiple comparisons) p-value changes, the cluster sizes change (as fewer or more voxels are included), and so the reported per-cluster alpha values change for both reasons – different p and different cluster size.
- ++ The alpha values reported are ‘per-cluster’, and are not themselves
- corrected for multiple comparisons ACROSS clusters. These alpha values are corrected for multiple comparisons WITHIN a cluster.
AFNI will use the NN1, NN2, NN3 tables as needed in its Clusterize interface if they are all stored in the statistics dataset header, depending on the NN level chosen in the Clusterize controller.
The blur estimates (provided via -fwhm, say) can come from various sources.
- If ‘3dmerge -1blur_fwhm SIZE’ is used to apply the blur to EPI data, that blur is on top of what is already in the data. It is then appropriate to estimate the final blur size using 3dFWHMx on the residual EPI time series (after regression). The final blur will generally be a bit larger than SIZE. Consider how this is done by afni_proc.py.
- If ‘3dBlurToFWHM -FWHM SIZE’ is used, then one can use SIZE directly (since the resulting blur is SIZE, it is not on top of what is in the data to begin with).
- Some people prefer to estimate the smoothness from the stdev of error in the given statistical test, rather than the residuals.
3dClustSim -nxyz 8 8 8 -dxyz 2 2 2 -fwhm 8 -niter 10000 AlphaSim -nxyz 8 8 8 -dxyz 2 2 2 -fwhm 8 -niter 10000 -quiet -fast -pthr 0.0005
From the 3dClustSim command above, you should get a warning message similar to this, right after the table (only the first 2 lines are shown):
# 3dClustSim -fwhm 7 # 1-sided thresholding # Grid: 64x64x32 3.50x3.50x3.50 mm^3 (131072 voxels) ^ # CLUSTER SIZE THRESHOLD(pthr,alpha) in Voxels # -NN 1 | alpha = Prob(Cluster >= given size) # pthr | 0.100 0.050 0.020 0.010 # —— | —— —— —— ——
0.050000 162.5 182.2 207.8 225.7 0.020000 64.3 71.0 80.5 88.5 0.010000 40.3 44.7 50.7 55.1 0.005000 28.0 31.2 34.9 38.1 0.002000 19.0 21.2 24.2 26.1 0.001000 14.6 16.3 18.9 20.5 0.000500 11.5 13.0 15.1 16.7 0.000200 8.7 10.0 11.6 12.8 0.000100 7.1 8.3 9.7 10.9
e.g., for this sample volume, if the per-voxel p-value threshold is set at 0.005, then to keep the probability of getting a single noise-only cluster at 0.05 or less, the cluster size threshold should be 32 voxels (the next integer above 31.2).
If you ran the same simulation with the ‘-nodec’ option, then the last line above would be
0.000100 8 9 10 11
If you set the per voxel p-value to 0.0001 (1e-4), and want the chance of a noise-only false-positive cluster to be 5% or less, then the cluster size threshold would be 9 – that is, you would keep all NN clusters with 9 or more voxels.
The header lines start with the ‘#’ (commenting) character so that the result is a correctly formatted AFNI .1D file – it can be used in 1dplot, etc.
automatic parallelizer software toolkit, which splits the work across multiple CPUs/cores on the same shared memory computer.
by a network (e.g., OpenMP doesn’t work with ‘cluster’ setups).
your system. You can control this value by setting environment variable OMP_NUM_THREADS to some smaller value (including 1).
using all CPUs available. ++ However, on some systems (such as the NIH Biowulf), it seems to be
necessary to set OMP_NUM_THREADS explicitly, or you only get one CPU.
count, since using more than (say) 16 threads is probably useless.
since OpenMP queries this variable BEFORE the program actually starts. ++ You can’t usefully set this variable in your ~/.afnirc file or on the
command line with the ‘-D’ option.
it was coded. You’ll have to experiment on your own systems!
The number of CPUs on this particular computer system is ...... 16.
The maximum number of CPUs that will be used is now set to .... 7.
++ Compile date = Dec 16 2015