Processing block: post-regress Finish the single subject processing script, starting after 3dDeconvolve. Background: The end of the proc.FT script includes the following steps. Review them. 0. check for failure of 3dDeconvolve 1. check the X-matrix for correlation matrix warnings 2. create all_runs (though that has been mentioned) 3. create ideal regressors (has been mentioned), as well as a sum 4. estimate the blur in the data 5. use the blur and mask to enable correction for multiple comparisons 6. generate a script to quickly preview the EPI data 7. delete temporary files and return to parent directory 8. comment that shows afni_proc.py command =========================================================================== 0. check for failure of 3dDeconvolve When programs are run from the shell they return a code that tells the shell whether they were successful or had some error. This value is stored in the $status variable by the shell, so that the user can check it. Generally, a $status of 0 indicates success, while non-zero values may indicate different sorts of errors. The proc.FT script checks whether $status is 0 after 3dDeconvolve. If it is not, the script warns the user and terminates with an 'exit' command. Note that if the user runs the proc script as recommended (tcsh -xef ...), then the script will automatically terminate on any non-zero return value. ------------------------------ 1. check the X-matrix for correlation matrix warnings The proc.FT script: 1d_tool.py -show_cormat_warnings -infile X.xmat.1D \ |& tee out.cormat_warn.txt This 1d_tool.py command looks at the correlation matrix (normalized X'X, the matrix of pairwise correlations between the regressors) and prints a message about anything that looks large (starting at a correlation of 0.4, by default). Run the main command on the X.xmat.1D file in a terminal window. % 1d_tool.py -show_cormat_warnings -infile X.xmat.1D We see 4 warnings, 1 about our regressors of interest, and 3 of all of the polort baseline pairs. Note that fewer runs means higher correlation between constant polort terms. With only 3 runs, the polort baseline terms have correlations of -0.5. It looks high, but just means they can predict each other to some degree. If we had only 2 runs, the constant polort terms would have a correlation of -1.0! That seems awful, but in that case they are essentially just negatives of each other (subject to setting the mean to 0). To put it in different terms, given that the value is 0 in one baseline regressor, what is the probability that it is 1 an another? In the case of 3 runs, the answer is 0.5. In the case of 2 runs the answer is 1.0. This happens to work out to be equal to the correlation values for any number of runs (greater than 1, of course). So anyway, they predict one another. --- What about our regressors of interest? A correlation value of -0.612 seems very high. Indeed, it is, for the very same reason. If one regressor is 1, the other is probably 0; they help predict one another. However there is enough variance across time, and there are no additional regressors that would lead towards multi-collinearity. So the moral of the story is: think about these warnings, but do not panic. A high correlation does not actually mean there is a problem (which is why they are warnings and not errors). --- Recall the last part of the command: |& tee out.cormat_warn.txt This is to 'pipe' all output (stdout and stderr) to the 'tee' program, which sends the text: 1. back to the terminal, so we can see it as it runs 2. duplicated to a text file, so we can read it later, too That is a good way to run scripts in general. Text files are small. ------------------------------ 2. create all_runs (though that has been mentioned) We talked about the all_runs dataset in t16 and t18, mostly. Recall that 3dDeconvolve does not need the EPI data input as a single dataset (though that works as well). We gave 3dDeconvolve the list of scaled datasets as input. But to look at all of the runs, particularly along with the fitts data, it is useful to have all of the runs in a single dataset. From the proc.FT script: 3dTcat -prefix all_runs.$subj pb04.$subj.r??.scale+orig.HEAD This just catenates the scaled datasets into one. ------------------------------ 3. create ideal regressors (has been mentioned), as well as a sum Again, this was mentioned in earlier pages. Here we use 1dcat to extract single regressors (of interest) from the X-matrix, output by 3dDeconvolve. From the proc.FT script: 1dcat X.xmat.1D'[12]' > ideal_Vrel.1D 1dcat X.xmat.1D'[13]' > ideal_Arel.1D Recall that there were 3 runs times 4 baseline regressors per run (cubic polort), giving 12 baseline polort regressors in total. So columns 0..11 of the X-matrix are those polort regressors, and the next ones are as specified in the 3dDeconvolve command. We gave the regressors of interest next, followed by the motion. So regressors 12 and 13 are our ones of interest. We looked at those in the afni Graph window, along with our EPI data, as well as plotting them individually or as a pair. For example: % 1dplot X.xmat.1D'[12,13]' or equivalently: % 1dplot ideal_Vrel.1D ideal_Arel.1D Recall that the first time series is plotted on the bottom. Advanced: In the case of censoring, afni_proc.py generates the ideal files from the uncensored X-matrix (including all TRs), X.uncensored.xmat.1D. This matrix is also generated by 3dDeconvolve using the option: -x1D_uncensored X.uncensored.xmat.1D To see the censored regressors (what was actually used in the regression) stick with X.xmat.1D. Note that if any TRs are censored, X.xmat.1D will be shorter than X.uncensored.xmat.1D (by that number of TRs). The ideals are generated from the uncensored data so that they still align with the EPI data over time. The censored regressors are essentially squeezed in when TRs are deleted. For example, if 10 TRs are censored early in the time series, the rest of each regressor will be shifted left by 10 TRs, relative to the all_runs dataset. So it becomes less clear in matching the stimulus events with the BOLD responses. ------------------------------ 4. estimate the blur in the data In this step blur_est.$subj.1D is created, containing blur estimates for any time series that were asked for (in the afni_proc.py command). The blur can be estimated from any of: all_runs - input to 3dDeconvolve errts - residuals output from 3dDeconvolve errts_REML - residuals output from 3dREMLfit (advanced) It is most proper to estimate the blur from the residuals (either errts or errts_REML, depending on what is being used). Though in truth the numbers tend to come out very close, so it is not so important. The 3dFWHMx program is used to measure the blur one run at a time (over the mask 'full_mask', though it may not have been applied to the EPI data). The scripting looks a little messy, but mostly because it is not assumed that all runs are of the same length. It computes starting and ending indexes for each run, and uses them to input the data to 3dFWHMx. Blurs are averaged across the runs, with independent values for x, y and z. These averages are put into the output file, blur_est.$subj.1D. For this script, the file contains just one line. It might contain up to 3, as noted above. 5.8063 5.99902 4.89269 # errts blur estimates This script used 3dmerge to apply a 4 mm FWHM blur to the data, but we see the blur estimates are easily higher than that. It is because there is blur right out of the scanner, as well as extra for any spatial interpolation of the data. These numbers are used when deciding on minimum cluster sizes for multiple comparison corrections. For single subject results (which we usually do not care about) the script later runs 3dClustSim and adds the cluster table to the stats dataset, output by 3dDeconvolve. The more important use is to average these numbers across all subjects, and then run 3dClustSim on the group mask. That gives cluster sizes needed to get the corrected p-values at the group level. ------------------------------ 5. use the blur and mask to enable correction for multiple comparisons Run 3dClustSim (with the blur estimates just mentioned above) to get some approximate alpha values (corrected p-values) corresponding with cluster sizes. Note that 3dClustSim is just like AlphaSim, but it does the process repeatedly (and more efficiently), getting results for many input p-values (uncorrected p-values). From the proc.FP script: 3dClustSim -both -NN 123 -mask full_mask.$subj+tlrc \ -fwhmxyz $fxyz[1-3] -prefix ClustSim Expanding the $subj and $fxyz variables shows the command as: 3dClustSim -both -NN 123 -mask full_mask.FT+orig \ -fwhmxyz 5.8063 5.99902 4.89269 -prefix ClustSim Ignoring the options -both and -NN 123 for now, 3dClustSim creates a table of cluster sizes (in voxels, or fractions there of). Each row of the table is for an uncorrected p-value (that one might use as a threshold in afni). Each column is for a corrected p-value (that one might report on a paper (of course both are probably reported)). Look at the output text file ClustSim.NN1.1D (using cat, or some editor if that is vastly preferred). % cat ClustSim.NN1.1D Again, the left column shows uncorrected p-values (pthr), while the column headings show corrected p-values. So each table entry is a cluster size required to obtain such a corrected p-value. For example, if we have an uncorrected p-value of 0.0001 (10^-4) and want to report clusters at a corrected p-value of 0.05, the clusters must be of size 5.4 voxels (round up to 6). Look at the ClustSim output for the row with uncorrected p of 0.0001 and move towards the right until the column with an alpha heading of 0.050. The value of 5.4 is the minimum cluster size required for this corrected p, in voxels. That means in the afni GUI we can set the threshold of the t-value of the V-A contrast (for example, though it makes no difference) to 3.916. This t-value equates to p=1.0*10^4, as shown in the afni GUI. The results are now thresholded at an uncorrected p of 0.0001. To threshold at a corrected p=0.05, we need to require clusters of minimum size 6 voxels. So Clusterize in the GUI, changing the minimum size from 20 down to 6, and click Set. Now while each voxel is still at an uncorrected p of 0.0001, the clusters are displayed at a corrected p of 0.05. Then look at the Cluster Report (Rpt button under Clusterize). In the afni GUI: - controller [A]: Clusterize (leave NN=1, set Voxels=6), and Set - controller [A]: Rpt (under Clusterize) The Cluster Results window shows 98 clusters (scroll all the way down) of size at least 6. Note the Alpha values in the right column of the report. At the top the values are <<0.01 (<< means MUCH less than), then they go to just < 0.01 at cluster 36 (size 15 voxels), then change to 0.02 at #70 (for clusters of size 7), and finally end up at 0.04 for the clusters of size 6. Note that the GUI basically applies one row of the ClustSim report table. At an uncorrected p of 0.0001, given different cluster sizes, it can show various corrected p-values, a different one for each different cluster size (though limiting the results). Note that no cluster below size 6 is shown, just because we decided on that limit. If we wanted to have a corrected p=0.01, that would mean a minimum cluster size 8 (from 7.6 in the ClustSim table). If we scroll up to #69 of the Cluster Report in the afni GUI, we see that clusters starting at size 8 get an Alpha value of 0.01, as expected. So if we had specified 8 voxels as a cluster minimum, the report would have just shown those first 69 clusters, rather than the 98 that we got when requiring only 6 voxels. It would have merely been a truncated version. If we required 20 voxels, we are literally "off the chart" with respect to the ClustSim table. The minimum corrected p-value in that table is 0.01. We would have to ask 3dClustSim to report more strict Alpha values, if we really cared for that. But at 20 voxels in the GUI, we see there would still be 32 clusters (starting at p=0.0001). * Note, the single subject results are extremely significant, because: a. The effects are strong (visual and audio stimuli, comparing to a rest condition without them). b. This is a block design, with 30 second stimuli. The BOLD response is very large in this case (5.5 times as large as it might be for a similar but instantaneous stimulus). The BOLD signal has saturated. c. The df (degrees of freedom) come from the number TRs (minus the number of regressors). Here, df=430 (450 TRs minus 20 regressors). d. At the group level, df comes from the number of subjects! This is generally far smaller than the number of TRs. So at the same t-value, the p-value is generally much larger in a group result. ------------------------------ 6. generate a script to quickly preview the EPI data From the proc.FT script (squeezing the wildcard onto one line...): gen_epi_review.py -script @epi_review.$subj -dsets pb00.*.tcat*.HEAD The input to this gen_epi_review.py is just a list of EPI datasets. The output from it is the file @epi_review.FT, which is a t-shell script. Briefly view the script with less or some editor. % less @epi_review.FT This script: - makes a list of EPI datasets (the tcat output, unprocessed) - starts afni - opens an axial image, a sagittal image and a sagittal graph (one point is to look for head motion, hence a sagittal image) - sends a command to afni to view the first EPI dataset - takes a brief nap - has a loop over each dataset to: - view the current run - auto-scale the graph - hit 'v' to to into video mode (scrolling over time) ---> at this point, the user just watches the EPI data, looking for motion or other artifacts ---> the user can hit to stop the video mode - the user is prompted (in the terminal window) to hit to begin viewing the next run - stop video mode when done () The purpose is to provide an example of how to write automation scripts, as well as show how to quickly look at single subject data. ------------------------------ 7. delete temporary files and return to parent directory Files name rm.SOMETHING are considered temporary, and are deleted here. The 'cd ..' command to return to the parent directory is symmetric with the 'cd $output_dir' command at the top of the script (the top on enters the results directory, the bottom one exits it). Returning to the parent directory is useless for cases where the script is run via 'tcsh', but is a good scripting practice and useful in more esoteric cases. ------------------------------ 8. comment that shows afni_proc.py command The script ends with commented lines showing the afni_proc.py command that was used to create the script. Note that the input datasets (such as FT/FT_epi_r1+orig.HEAD) and timing files (such as FT/AV1_vis.txt) are listed fully here, though they are as wildcards in the afni_proc.py command in s01.ap.simple. Recall that the shell expands the wildcards into filename lists, before the called program is executed. This is to say afni_proc.py does not see the '*', it sees the list of files, as if the user had typed them all.