Hi,
I don't have any services to offer yet, but we do have an interest in reducing the compute time for extended calculations. Here's our story...
We also run analyses that are taking tens of hours to complete per subject and would like to streamline execution by dividing things up among multiple processors. As I understand parallel processing, the separate processes actually exchange information at points along the calculation stream. If this is true, I'm not sure we need parallel processing, just multiple serial processes run in parallel. We are looking into upgrading our computing facilities in order to give us the ability to do this.
Our typical deconvolutions take 10 to 20 hours to run per subject. Some studies may eventually have upwards of 50 subjects. Deconvolutions/regressions take so long, I think, because we have approximately 500-1000 48-slice EPI volumes and 70-100 predictors. We do masking and other things to cut down the number of calculations, but still require several hours of calculation. We could streamline the process if we could, say, break a deconvolution/regression into several "slices" (as shown in some of the FAQ examples), and farm each of these slice-based timeseries out to separate processors for analysis. Right now we do this to some degree on our SGI Origin 2000, which has 6 processors, but a system with more processors would, of course, be faster.
I think the key feature of the SGI that allows us to farm out multiple processes (via a single terminal session) is the shared memory, but I am not sure of this. Any system with such an architecture (be it shared memory or otherswise) would be one that we would consider using. Since Beowulf clusters appear to allow the spawning of multiple serial processes they look like an attractive solution to our "problem."
-jim