You can use 3dMean instead of 3dcalc to get a mean or sum across subjects. It shouldn't make any difference in the image which one you use though. With the mean, you generate a kind of probability map. That map gives the fraction of subjects that had grey matter identified at each voxel given the alignment method and parameters you used. You could smooth the input images or cluster the output.
Nonlinear warping to a template with @auto_warp would likely improve correspondence across subjects. The grey matter mask could be determined from the template instead of the subjects.
Yet another way to go would be to generate your own template with @toMNI_Awarp and @toMNI_Qwarpar. I would start with the first and simplest way, and then progress to the others if you think it will be useful for you and as you understand the commands better.