:orphan:

.. _ahelp_1dCorrelate:

***********
1dCorrelate
***********

.. contents:: :local:


| 

.. code-block:: none

    Usage: 1dCorrelate [options] 1Dfile 1Dfile ...
    ------
     * Each input 1D column is a collection of data points.
     * The correlation coefficient between each column pair is computed, along
       with its confidence interval (via a bias-corrected bootstrap procedure).
     * The minimum sensible column length is 7.
     * At least 2 columns are needed [in 1 or more .1D files].
     * If there are N input columns, there will be N*(N-1)/2 output rows.
     * Output appears on stdout; redirect ('>' or '>>') as needed.
     * Only one correlation method can be used in one run of this program.
     * This program is basically the basterd offspring of program 1ddot.
     * Also see http://en.wikipedia.org/wiki/Confidence_interval
    
    -------
    Methods   [actually, only the first letter is needed to choose a method]
    -------   [and the case doesn't matter: '-P' and '-p' both = '-Pearson']
    
     -Pearson  = Pearson correlation                    [the default method]
     -Spearman = Spearman (rank) correlation      [more robust vs. outliers]
     -Quadrant = Quadrant (binarized) correlation  [most robust, but weaker]
     -Ktaub    = Kendall's tau_b 'correlation'    [popular somewhere, maybe]
    
    -------------
    Other Options  [these options cannot be abbreviated!]
    -------------
    
     -nboot B  = Set the number of bootstrap replicates to 'B'.
                 * The default value of B is 4000.
                 * A larger number will give somewhat more accurate
                   confidence intervals, at the cost of more CPU time.
    
     -alpha A  = Set the 2-sided confidence interval width to '100-A' percent.
                 * The default value of A is 5, giving the 2.5..97.5% interval.
                 * The smallest allowed A is 1 (0.5%..99.5%) and the largest
                   allowed value of A is 20 (10%..90%).
                 * If you are interested assessing if the 'p-value' of a
                   correlation is smaller than 5% (say), then you should use
                   '-alpha 10' and see if the confidence interval includes 0.
    
     -block    = Attempt to allow for serial correlation in the data by doing
       *OR*      variable-length block resampling, rather than completely
     -blk        random resampling as in the usual bootstrap.
                 * You should NOT do this unless you believe that serial
                   correlation (along each column) is present and significant.
                 * Block resampling requires at least 20 data points in each
                   input column.  Fewer than 20 will turn off this option.
    -----
    Notes
    -----
    * For each pair of columns, the output include the correlation value
      as directly calculated, plus the bias-corrected bootstrap value, and
      the desired (100-A)% confidence interval [also via bootstrap].
    
    * The primary purpose of this program is to provide an easy way to get
      the bootstrap confidence intervals, since people almost always seem to use
      the asymptotic normal theory to decide if a correlation is 'significant',
      and this often seems misleading to me [especially for short columns].
    
    * Bootstrapping confidence intervals for the inverse correlations matrix
      (i.e., partial correlations) would be interesting -- anyone out there
      need this ability?
    
    -------------
    Sample output  [command was '1dCorrelate -alpha 10 A2.1D B2.1D']
    -------------
    # Pearson correlation [n=12 #col=2]
    # Name      Name       Value   BiasCorr   5.00%   95.00%  N: 5.00% N:95.00%
    # --------  --------  -------- -------- -------- -------- -------- --------
      A2.1D[0]  B2.1D[0]  +0.57254 +0.57225 -0.03826 +0.86306 +0.10265 +0.83353
    
    * Bias correction of the correlation had little effect; this is very common.
      ++ To be clear, the bootstrap bias correction is to allow for potential bias
         in the statistical estimate of correlation when the sample size is small.
      ++ It cannot correct for biases that result from faulty data (or faulty
         assumptions about the data).
    
    * The correlation is NOT significant at this level, since the CI (confidence
      interval) includes 0 in its range.
    
    * For the Pearson method ONLY, the last two columns ('N:', as above) also
      show the widely used asymptotic normal theory confidence interval.  As in
      the example, the bootstrap interval is often (but not always) wider than
      the theoretical interval.
    
    * In the example, the normal theory might indicate that the correlation is
      significant (less than a 5% chance that the CI includes 0), but the
      bootstrap CI shows that is not a reasonable statistical conclusion.
      ++ The principal reason that I wrote this program was to make it easy
         to check if the normal (Gaussian) theory for correlation significance
         testing is reasonable in any given case -- for small samples, it often
         is NOT reasonable!
    
    * Using the same data with the '-S' option gives the table below, again
      indicating that there is no significant correlation between the columns
      (note also the lack of the 'N:' results for Spearman correlation):
    
    # Spearman correlation [n=12 #col=2]
    # Name      Name       Value   BiasCorr   5.00%   95.00%
    # --------  --------  -------- -------- -------- --------
      A2.1D[0]  B2.1D[0]  +0.46154 +0.42756 -0.23063 +0.86078
    
    -------------
    SAMPLE SCRIPT
    -------------
    This script generates random data and correlates it until it is
    statistically significant at some level (default=2%).  Then it
    plots the data that looks correlated.  The point is to show what
    purely random stuff that appears correlated can look like.
    (Like most AFNI scripts, this is written in tcsh, not bash.)
    
    #!/bin/tcsh
    set npt = 20
    set alp = 2
    foreach fred ( `count -dig 1 1 1000` )
      1dcat jrandom1D:${npt},2 > qqq.1D
      set aabb = ( `1dCorrelate -spearman -alpha $alp qqq.1D  | grep qqq.1D | colrm 1 42` )
      set ab = `ccalc -form rint "1000 * $aabb[1] * $aabb[2]"`
      echo $fred $ab
      if( $ab > 1 )then
        1dplot -one -noline -x qqq.1D'[0]' -xaxis -1:1:20:5 -yaxis -1:1:20:5            \
               -DAFNI_1DPLOT_BOXSIZE=0.012                                              \
               -plabel "N=$npt trial#=$fred \alpha=${alp}% => r\in[$aabb[1],$aabb[2]]"  \
               qqq.1D'[1]'
        break
      endif
    end
    \rm qqq.1D
    
    ----------------------------------------------------------------------
    *** Written by RWCox (AKA Zhark the Mad Correlator) -- 19 May 2011 ***
    
    ++ Compile date = Oct 13 2022 {AFNI_22.3.03:linux_ubuntu_16_64}