Have you tried overlaying the output of 3dAllineate on the TT_N27+tlrc brain? If this registration is significantly off, then I would suspect the output of 3dQwarp would also be incorrect.
I would suggest mimicking how afni_proc.py performs the warps via auto_warp.py, which will perform the affine transform (via @auto_tlrc) and then 3dQwarp. Your 3dNwarpApply command looks correct. You could also concatenate several transforms (as afni_proc.py does with with 3dNwarpApply) to warp the EPI to standard space with fewer interpolation steps.