If memory is a concern, I think it would be largely fine to analyze each session (or even each run) separately. Concatenation vs no concatenation: the difference is whether the residuals are pooled or separate, and the consequence is likely negligible.
> I am considering how to model it to get a beta map for each stimulus, which would be the input for further analysis.
Just curious: what kind of further analysis?
Gang