Intra-Class Correlation with REML Method

Intra-Class Correlation with REML Method

In contrast to Pearson's correlation in which the relationship between two variables (measures such as the height and body weight of a person) is concerned, intraclass correlation (ICC) is defined as the correlation of one variable (measure) between two or more members within groups. Applied to the context of brain imaging, ICC can be used as an indicator of scanning reliability or consistency across sessions/days/sites, for example, and, defined as the proportion of variability across subjects relative to the total variability in the data.

Here we provide a program, 3dICC_REML.R, to calculate the ICC on brain volume data instead of ROI data typically seen in the literature. The program is a further extension to the classical definition of ICC varieties [Shrout and Fleiss (1979), Psychological Bulletin, Vol. 86, No.2, 420-428], and it's different from the old program 3dICC.R in the following aspects:

1. It always generates all non-negative ICC values, avoiding the awkward situation of interpreting negative ICC numbers;

2. It does not limit the number of random factors (3dICC only allows 2- and 3-way ANOVA);

3. Missing data are allowed;

4. It's based on linear mixed-effects modeling with restricted maximum likelihood (REML) estimates, a frontier statistical tool.

If you need to cite the methods here, use the following:

Chen, G., Saad, Z.S., Britton, J.C., Pine, D.S., Cox, R.W. (2013). Linear Mixed-Effects Modeling Approach to FMRI Group Analysis.
NeuroImage 73:176-190.

Note (March 27, 2015) You may still use 3dICC_REML.R, but the functionality of 3dICC_REML.R has been fully incorporated as part of 3dLME now. See Example 4 in the 3dLME help. 

Program 3dICC_REML: 3dICC_REML, written in R, can be run on all major platforms such as unix-based systems and Windows, and requires R installation. The program will be part of AFNI distribution soon, but can be downloaded from here too. In addition, the lme4 package should be installed by running the following command inside R:

install.packages("lme4", repos="")

First, create a text file myModel.txt (or whatever file name) in the following format which stores all the information about output file name, the input files, etc.:

Output:OutputFileName <-- Optional: prefix only, and no view (e.g., tlrc) needed
FixEff:1 <-- specifies fixed effects such as covariates
RanEff:Session+Site+Subj <-- list all the random-effects variables
MASK:../data/Mask+tlrc.BRIK <-- Optional: a mask will significantly reduce runtime
Clusters:4 <-- Optional: number of parallel jobs; default is 1 if this line is not provided
Subj Session Site InputFile
Jim one site1 Jim1+tlrc
Jim two site1 Jim2+tlrc
Jim one site2 Jim3+tlrc
Jim two site1 Jim4+tlrc
Carol one site1 Carol1+tlrc

Note: sub-brick selector is allowed for input files. For example, Jim1+tlrc[0] (no quotes around the square brackets []) is acceptable.

As indicated in the table above exemplifying a 3 random effects model, 3 lines are optional, and "InputFile" as a label in the title is reserved.

Once myModel.txt is available, execute the following command at the prompt in the directory where file myModel.txt exists (if myModel.txt is not provided in the following command line, 3dICC_REML assumes the model specification is stored by default in file model.txt):

3dICC_REML.R myModel.txt MyOutput.txt &

Or if you run it remotely:

nohup 3dICC_REML.R myModel.txt MyOutput &


You can open file MyOutput to check the running progress. 

If there are n factors (categorical variables) in the model, there will have n+1 sub-bricks in the output file: one for each factor, plus one for the residuals.

ICC interpretation

There are three ways to interpret the ICC value. The first one is probably more popular: ICC measures the proportion of total variance that is attributable to an explanatory variable (typically a categorical variable such as session, scanner, etc.). However I feel the second interpretation is more intuitive: ICC is the expected correlation between any two effect estimates randomly drawn from the same level of the categorical variable. 

Use session with two levels (sessions 1 and 2) as an example, an ICC value of 0.75 for session can be interpreted from the following two aspects. 1) It shows the differentiation among the levels of the variable (sessions 1 and 2); that is, 75% of the variability can be accounted for across the two sessions. 2) It also indicates that the more distinguishable they are from each other between the two sessions, the more similar (or correlated) they are among each other within each level (correlation within each session). It's the second interpretation that quintessentially carries the connotation of "correlation" in ICC. Let's use language to drive the point here. The more dissimilar between two languages (English and Korean, for example), the more similar between any two speakers of a specific language.

Related to the second interpretation, a third one bears a flavor of information conservation: The ICC value shows the loss of information. If two effects are drawn from the same session, about 75% of the information is lost due to their similarity, and it is the amount of information that is not obtainable unless they were totally independent.

Useful links

1. Wikipedia: Intraclass correlation



I'd like to thank Javier Gonzalez-Castillo for motivating me writing the program.