Usage: 1dCorrelate [options] 1Dfile 1Dfile ...
------
* Each input 1D column is a collection of data points.
* The correlation coefficient between each column pair is computed, along
with its confidence interval (via a bias-corrected bootstrap procedure).
* The minimum sensible column length is 7.
* At least 2 columns are needed [in 1 or more .1D files].
* If there are N input columns, there will be N*(N-1)/2 output rows.
* Output appears on stdout; redirect ('>' or '>>') as needed.
* Only one correlation method can be used in one run of this program.
* This program is basically the basterd offspring of program 1ddot.
* Also see http://en.wikipedia.org/wiki/Confidence_interval
-------
Methods [actually, only the first letter is needed to choose a method]
------- [and the case doesn't matter: '-P' and '-p' both = '-Pearson']
-Pearson = Pearson correlation [the default method]
-Spearman = Spearman (rank) correlation [more robust vs. outliers]
-Quadrant = Quadrant (binarized) correlation [most robust, but weaker]
-Ktaub = Kendall's tau_b 'correlation' [popular somewhere, maybe]
-------------
Other Options [these options cannot be abbreviated!]
-------------
-nboot B = Set the number of bootstrap replicates to 'B'.
* The default value of B is 4000.
* A larger number will give somewhat more accurate
confidence intervals, at the cost of more CPU time.
-alpha A = Set the 2-sided confidence interval width to '100-A' percent.
* The default value of A is 5, giving the 2.5..97.5% interval.
* The smallest allowed A is 1 (0.5%..99.5%) and the largest
allowed value of A is 20 (10%..90%).
* If you are interested assessing if the 'p-value' of a
correlation is smaller than 5% (say), then you should use
'-alpha 10' and see if the confidence interval includes 0.
-block = Attempt to allow for serial correlation in the data by doing
*OR* variable-length block resampling, rather than completely
-blk random resampling as in the usual bootstrap.
* You should NOT do this unless you believe that serial
correlation (along each column) is present and significant.
* Block resampling requires at least 20 data points in each
input column. Fewer than 20 will turn off this option.
-----
Notes
-----
* For each pair of columns, the output include the correlation value
as directly calculated, plus the bias-corrected bootstrap value, and
the desired (100-A)% confidence interval [also via bootstrap].
* The primary purpose of this program is to provide an easy way to get
the bootstrap confidence intervals, since people almost always seem to use
the asymptotic normal theory to decide if a correlation is 'significant',
and this often seems misleading to me [especially for short columns].
* Bootstrapping confidence intervals for the inverse correlations matrix
(i.e., partial correlations) would be interesting -- anyone out there
need this ability?
-------------
Sample output [command was '1dCorrelate -alpha 10 A2.1D B2.1D']
-------------
# Pearson correlation [n=12 #col=2]
# Name Name Value BiasCorr 5.00% 95.00% N: 5.00% N:95.00%
# -------- -------- -------- -------- -------- -------- -------- --------
A2.1D[0] B2.1D[0] +0.57254 +0.57225 -0.03826 +0.86306 +0.10265 +0.83353
* Bias correction of the correlation had little effect; this is very common.
++ To be clear, the bootstrap bias correction is to allow for potential bias
in the statistical estimate of correlation when the sample size is small.
++ It cannot correct for biases that result from faulty data (or faulty
assumptions about the data).
* The correlation is NOT significant at this level, since the CI (confidence
interval) includes 0 in its range.
* For the Pearson method ONLY, the last two columns ('N:', as above) also
show the widely used asymptotic normal theory confidence interval. As in
the example, the bootstrap interval is often (but not always) wider than
the theoretical interval.
* In the example, the normal theory might indicate that the correlation is
significant (less than a 5% chance that the CI includes 0), but the
bootstrap CI shows that is not a reasonable statistical conclusion.
++ The principal reason that I wrote this program was to make it easy
to check if the normal (Gaussian) theory for correlation significance
testing is reasonable in any given case -- for small samples, it often
is NOT reasonable!
* Using the same data with the '-S' option gives the table below, again
indicating that there is no significant correlation between the columns
(note also the lack of the 'N:' results for Spearman correlation):
# Spearman correlation [n=12 #col=2]
# Name Name Value BiasCorr 5.00% 95.00%
# -------- -------- -------- -------- -------- --------
A2.1D[0] B2.1D[0] +0.46154 +0.42756 -0.23063 +0.86078
-------------
SAMPLE SCRIPT
-------------
This script generates random data and correlates it until it is
statistically significant at some level (default=2%). Then it
plots the data that looks correlated. The point is to show what
purely random stuff that appears correlated can look like.
(Like most AFNI scripts, this is written in tcsh, not bash.)
#!/bin/tcsh
set npt = 20
set alp = 2
foreach fred ( `count_afni -dig 1 1 1000` )
1dcat jrandom1D:${npt},2 > qqq.1D
set aabb = ( `1dCorrelate -spearman -alpha $alp qqq.1D | grep qqq.1D | colrm 1 42` )
set ab = `ccalc -form rint "1000 * $aabb[1] * $aabb[2]"`
echo $fred $ab
if( $ab > 1 )then
1dplot -one -noline -x qqq.1D'[0]' -xaxis -1:1:20:5 -yaxis -1:1:20:5 \
-DAFNI_1DPLOT_BOXSIZE=0.012 \
-plabel "N=$npt trial#=$fred \alpha=${alp}% => r\in[$aabb[1],$aabb[2]]" \
qqq.1D'[1]'
break
endif
end
\rm qqq.1D
----------------------------------------------------------------------
*** Written by RWCox (AKA Zhark the Mad Correlator) -- 19 May 2011 ***
++ Compile date = Oct 31 2024 {AFNI_24.3.06:linux_ubuntu_24_64}