The thresholding process envisioned and implemented in 3dClustSim has 2 steps:
(1) Per-voxel thresholding at some p-value. Smaller p-values are more strict: fewer voxels will "survive".
(2) From the "surviving" voxels, form spatially contiguous clusters. Delete those that are smaller than the cluster-size threshold.
Consider the case where you lower step 1's per-voxel p a lot (raise the t-statistic threshold high). Say you have Nvox=10
5 voxels, and set per-voxel p=5x10
-7=0.05/Nvox. Then the Bonferroni correction tells you that the probability of ANY false positive voxel is less than 0.05 -- that is, a cluster-size threshold of 1 voxel is good.
Now imagine increasing p (lowering t) -- then there is a bigger chance of false positive voxels. At some point as p increases, when more false positive voxels occur, they will clump together, and the chance of getting a cluster of size 2 voxels FROM NOISE ALONE will get large.
And so on. Larger per-voxel p means more false positive voxels, means that to keep the overall (or global) false positive probability at 5%, we have to eliminate more clusters.