- Each input 1D column is a collection of data points.
- The correlation coefficient between each column pair is computed, along with its confidence interval (via a bias-corrected bootstrap procedure).
- The minimum sensible column length is 7.
- At least 2 columns are needed [in 1 or more .1D files].
- If there are N input columns, there will be N*(N-1)/2 output rows.
- Output appears on stdout; redirect (‘>’ or ‘>>’) as needed.
- Only one correlation method can be used in one run of this program.
- This program is basically the basterd offspring of program 1ddot.
- Also see http://en.wikipedia.org/wiki/Confidence_interval
-Pearson = Pearson correlation [the default method]
-Spearman = Spearman (rank) correlation [more robust vs. outliers]
-Quadrant = Quadrant (binarized) correlation [most robust, but weaker]
-Ktaub = Kendall’s tau_b ‘correlation’ [popular somewhere, maybe]
- -nboot B = Set the number of bootstrap replicates to ‘B’.
- The default value of B is 4000.
- A larger number will give somewhat more accurate confidence intervals, at the cost of more CPU time.
- -alpha A = Set the 2-sided confidence interval width to ‘100-A’ percent.
- The default value of A is 5, giving the 2.5..97.5% interval.
- The smallest allowed A is 1 (0.5%..99.5%) and the largest allowed value of A is 20 (10%..90%).
- If you are interested assessing if the ‘p-value’ of a correlation is smaller than 5% (say), then you should use ‘-alpha 10’ and see if the confidence interval includes 0.
- -block = Attempt to allow for serial correlation in the data by doing
- OR variable-length block resampling, rather than completely
- -blk random resampling as in the usual bootstrap.
- You should NOT do this unless you believe that serial correlation (along each column) is present and significant.
- Block resampling requires at least 20 data points in each input column. Fewer than 20 will turn off this option.
# Pearson correlation [n=12 #col=2] # Name Name Value BiasCorr 5.00% 95.00% N: 5.00% N:95.00% # ——– ——– ——– ——– ——– ——– ——– ——–
A2.1D[0] B2.1D[0] +0.57254 +0.57225 -0.03826 +0.86306 +0.10265 +0.83353
Bias correction of the correlation had little effect; this is very common. ++ To be clear, the bootstrap bias correction is to allow for potential bias
in the statistical estimate of correlation when the sample size is small.
assumptions about the data).
The correlation is NOT significant at this level, since the CI (confidence interval) includes 0 in its range.
For the Pearson method ONLY, the last two columns (‘N:’, as above) also show the widely used asymptotic normal theory confidence interval. As in the example, the bootstrap interval is often (but not always) wider than the theoretical interval.
In the example, the normal theory might indicate that the correlation is significant (less than a 5% chance that the CI includes 0), but the bootstrap CI shows that is not a reasonable statistical conclusion. ++ The principal reason that I wrote this program was to make it easy
to check if the normal (Gaussian) theory for correlation significance testing is reasonable in any given case – for small samples, it often is NOT reasonable!
Using the same data with the ‘-S’ option gives the table below, again indicating that there is no significant correlation between the columns (note also the lack of the ‘N:’ results for Spearman correlation):
# Spearman correlation [n=12 #col=2] # Name Name Value BiasCorr 5.00% 95.00% # ——– ——– ——– ——– ——– ——–
A2.1D[0] B2.1D[0] +0.46154 +0.42756 -0.23063 +0.86078
* Written by RWCox (AKA Zhark the Mad Correlator) – 19 May 2011 *
++ Compile date = Dec 16 2015