Interesting, as usual....
The rigid_equiv might work for this application too, but I think the nonlinear warp is a good idea to add onto the rigid transformation. You may want to limit the nonlinear warping to stop and start at moderate levels (~4).
I'm surprised that you're seeing an error like that. We regularly include nonlinear warps in native space and in template space with affine transformations in-between. 3dNwarpApply should handle that, but I would expect 3dNwarpCat to also handle that. You may want to send us those warps and transformations. The workaround is something like what you tried, but instead zeropad the grids to encompass both datasets. It's useful to center datasets to start around the template to avoid making the zeropadding excessive.