There are several related discussions of this on the messageboard that might be useful.
https://afni.nimh.nih.gov/afni/community/board/read.php?1,73472,73475
Consider those suggestions in that thread. Splitting into chunks rather than slices can help with the 3dDeconvolve step, but that would be done most easily with the 3dZcutup step too. Chunks would have less overhead, but you would have to decide how to best split the dataset to fit in memory. If you are running this on a cluster, you would have to consider how many nodes you might have available too. Many other steps require whole volumes like motion correction and alignment. In that case, consider options like downsampling your data.