I had actually done something a bit different. Let's say you have four kinds of files
1. DWI data (native space)
2. FA map (native space)
3. T2-weighted anatomy
4. T1-weighted anatomy (MPRAGE or similar) (native space T1)
5. Template based on T1 data (standard space - MNI)
6. Atlas (segmentation) based on the template (standard space - MNI)
Then there are a variety of ways to get the segmentation data onto the FA space.
1. FA->Template. This is the way I suggested. Use the inverse transformation to move the Atlas data back to the FA's nativ space.
2. DWI->T1->Template. Combine transformations and invert to use Atlas as before. Usually B0 would be used from the DWI data.
3. DWI->T2->Template. Like previous but T2 is used because it is similar to the B0 volume. Tortoise uses this approach and combines distortion and motion correction.
4. FA->T1->Template. This is closest to your approach. You just need the T1->template part which is computed using @auto_tlrc (affine) or auto_warp.py (nonlinear).
In the forward direction, you would rotate all the gradients properly in order to get eigenvectors all pointing in the right directions. That way tractography will work better. All the methods are invertible, so you can transform the segmentation to the original native space data.