Perhaps you could do 3dDeconvolve with -iresp first, which gives you voxel-by-voxel impulse responses for each stimulus type. Then each frame (i.e. each sub-brick) of each impulse response is the average signal for that stimulus type at that lag, voxelwise. Then 3dmaskave will convert that from voxelwise to averages per ROI.
So you'll end up with a separate BRIK file for each ROI, for each stimulus type. Within each one, each sub-brick will be the average signal at that sub-brick's lag.
-dave