With such a long TR, treating the response to the stimulus as an impulse makes sense -- that is what you are doing with the 3dfim analysis. You could also do this analysis with 3dDeconvolve (with minlag=maxlag=0), which would add the ability to compute contrasts between different tasks (e.g., responses to word sounds vs. responses to pure tones). For more details on 3dDeconvolve, see its manual and the AFNI educational materials page.
bob cox