The issue of scaling beforehand versus percent signal change conversion afterwards seems a recurring theme all the time. On one hand, the later looks like more accurate than scaling the signal beforehand by the voxel-wise average in the sense that baseline values are better than the averages. However, such an argument bears an assumption that the BOLD response remains the same across runs regardless the baseline value since one regressor (one beta) is used for the same stimulus type across runs. Such a formula of "0.25*Ort[0]+0.25*Ort[4]+0.25*Ort[8]+0.25*Ort[12]" also implies this underlying assumption. But the assumption may become questionable if the baseline values vary significantly across runs. When the assumption is violated, both the regresson model and the post-hoc coversion would suffer in terms of accuracy.
My guess is that most of the time both approaches probably don't differ much, but if the conversion accuracy of percent signal change becomes a real issue and if the sample size in each run is big enough, a better solution out of the dilemma seems to me: (1) Do NOT concatenate multiple runs - analyze each run separately with multiple analyses for each subject; or (2) Concatenate multiple runs, but create a regressor of each stimulus type for each run separately in the analysis. There would be one beta per stimulus type per run, and post-hoc conversion would be more accurate. Another benefit of this is that we can also test cross-run difference or trend analysis. Then just take those multiple beta's for group analysis.
Gang