Although I didn't run into the exact problem you did with your data, I found there was some difficulty dealing with the size of the anatomical dataset (0.5 mm^3). Programs would take quite a while to process that dataset, and 3dAllineate would crash with a malloc error showing it was out of memory. To get around this problem, you can either reduce the resolution of the data or restrict the anatomical data to just the part that you may be interested in. I tried both, and I chose the latter. These are the steps I followed:
# create a single sub-brick copy of the EPI dataset adding 6 slices
# in each direction
3dZeropad -R 6 -L 6 -A 6 -P 6 -I 6 -S 6 -prefix epi_zp6b SN01_S2_EP_Demo+orig.'[0]'
# keep only the region covered by the zeropadded EPI in the Anatomical data
3dresample -master epi_zp6b+orig. -dxyz 0.5 0.5 0.5 -prefix anat_zp_reduced -input SN01_S2_EXP_MP_S_oblique+orig.
# align the reduced anatomical dataset to the EPI data
align_epi_anat.py -anat anat_zp_reduced+orig -epi SN01_S2_EP_Demo+orig. -epi_base 0 -cmass nocmass -anat_has_skull no -deoblique off -volreg off -suffix _al_zp_red -AddEdge -child_anat SN01_S2_EXP_MP_S_oblique+orig
Some brief explanation of the options for the alignment script may be helpful here. Because you had already skull-stripped the data, the "-anat_has_skull no" was used. Also the data was already previously matched in obliquity with 3dWarp, so "-deoblique off" was used. The data started off very close, and the epi dataset was composed of only partial coverage coronal slices, so "-cmass nocmass" was used. The "-AddEdge" option gives the edge display images to make it easier to visualize the match. Finally, the "-child_anat" option applies the transform to the original data instead of the resampled, zero-padded data.