Hi, Dante-
OK, both the raid and the biowulf are using multiple CPUs. My guess about the "not so big difference" is in due part to:
1) different architectures
2) different I/O speeds (also related to architecture)
3) the diminishing returns of more CPUs in parallelization after some point.
Probably the raid has faster file I/O (even with using the temporary scratch disk on biowulf, though that should be more similar).
Note that running on my desktop with multiple CPUs, I also ran faster than biowulf, so the architecture of the raid might be more similar to that.
If you wanted a detailed comparison, you could specify the partition on biowulf in your sbatch command, and run the same thing with 8 CPUs and, say, 24 CPUs (rather than 72, which might make you wait around a lot). I don't think you can compare 8 on raid vs 72 on biowulf directly because of too many differences.
Note that when I sbatch/swarm processes that should last <4 hours, I run with "--partition=norm,quick", so that I get a larger number of nodes to choose from (likely less waiting time). Here is an example of submitting a job with more specifications (you might leave out the "--gres=..." part if you aren't using a temporary scratch disk to write to, which is described in the running-FS-on-biowulf page, earlier in this thread):
sbatch \
--partition=norm \
--cpus-per-task=4 \
--mem=4g \
--time=12:00:00 \
--gres=lscratch:10 \
do_*.tcsh
The real power of biowulf comes from being able to start, say, 500 processes and just let the computers sort stuff out and run when available, even if it means each individual run is a bit slower. As biowulf help notes, sbatch/swarm isn't reeeally meant for tiny small jobs, but really for larger jobs and many of them---that is where the largest benefit of it comes in.
--pt