(1) The slowest part of 3dBlurToFWHM is the local estimation and control of smoothness. If you turn this feature off via '-nbhd NULL', the program will run much faster. Then the program will just blur until the global smoothness estimate matches your target (more or less).
(2) The program does not use multiple CPUs.
(3) If you just want to ADD smoothness, then the new-ish program 3dBlurInMask will do the trick -- it is pretty much like a standard Gaussian blur with a given FWHM, but only inside a mask. (This program DOES use multiple CPUs, but is generally so fast that it doesn't really help much.)