You should be able to take the output and convert that to NIFTI with 3dcopy or 3dAFNItoNIFTI without reapplying the transformation matrix. The computed transformation matrix is defined as base to source if you want to go that route. In the script, 3dAllineate applies the matrix, and that can be given NIFTI output if you like.
To visually assess the registration, you can use the -AddEdge option. The script runs the @AddEdge program for you on the initial and aligned (hopefully) datasets and presents both with an enhanced edge display.
If things still don't look like they make sense, I'd be glad to look at the data for you. Use the upload data link on the left.
We tried the script with the same version of Python and tcsh, so I'm not sure where the differences are, but I'll leave that alone for now.