For real data, you can compare the results using enorm values and similar kinds of derivative values for motion. See 1d_tool.py for possibilities. For affine methods that use 12 parameters (like 3dAllineate) instead of just the 6 rigid ones that 3dvolreg uses, it's not clear what the weights should be for the last 6 parameters. No matter which of these metrics you use though, you can't really compare the quality of the alignment for real data with any of them. Instead you will have to rely on visual verification by looking at the data. You can synthesize a dataset however where you impose some kind of motion and see how well you can detect and fix it. In that case, I would look at the individual alignment/motion parameters.