I am exploring the BOLD5000 dataset (Chang et al., 2019;
link to webpage;
link to paper). I am considering how to model it to get a beta map for each stimulus, which would be the input for further analysis.
The dataset has four participants who each participated in a slow event related design over ~16 sessions of ~10 runs each. Over this time, they viewed ~5000 distinct images. If it were concatenated, the full dataset has on the order of 30,000 time points. While compressed and represented as 16-bit integers, the functional data for one subject is about 20GB as nii.gz--concatenation converts to 32-bit floats while uncompressing.
Of course, since it is slow and event related, I might just extract some volumes for analysis without modeling. But I am interested in thinking in how to approach the modeling problem at this scale. Since the images are all distinct, it seems that modelling all sessions/runs at once would be ideal.