###########################################################################

  FATCAT-MVM description and test demo.
  Aug 2014, by PA Taylor and G Chen.

  v1.1

  This directory contains an example data set for taking a group's
  structural connectivity output from 3dTrackID (*.grid files),
  combining it in a simply-formatted table with subject information
  from a CSV file (e.g., dumped from a spreadsheet), and building a
  script to run 3dMVM for statistical modeling.  Importantly, this
  provides a way to combine a network's worth of data (so it is
  something like a set of repeated measures) for investigating
  statistical relations between structural scan info (for example, FA)
  and other measures, observations, test scores, etc.

  This procedure also works directly for investigating functional
  connectivity with *.netcc files output by 3dNetCorr.  For all steps
  described in this demo set, one could equivalently replace *.grid
  file procedures for *.netcc ones.

  Below is a description of: 
      + what data needs to be input (and how formatted);
      + what each program does;
      + a brief description of some options; and
      + what the outputs are.

  For questions about the 3dTrackID/3dNetCorr side of things, contact:
      PA Taylor:  neon.taylor@gmail.com
  For questions about the 3dMVM/modeling side of things, contact:
      G Chen:  gangchen@mail.nih.gov

  Requires:  
           R 3.0 or above.
           Python 2.7 or above.
           AFNI compile date Aug. 11, 2014 or later.

##########################################################################

  This program is part of AFNI-FATCAT:
    Taylor PA, Saad ZS (2013). FATCAT: (An Efficient) Functional And
    Tractographic Connectivity Analysis Toolbox. Brain Connectivity.

  For citing the statistical approach, please use the following:
    Chen, G., Adleman, N.E., Saad, Z.S., Leibenluft, E., Cox, R.W. (2014).
    Applications of Multivariate Modeling to Neuroimaging Group Analysis:
    A Comprehensive Alternative to Univariate General Linear Model. 
    NeuroImage 99:571-588.
    http://afni.nimh.nih.gov/pub/dist/HBM2014/Chen_in_press.pdf

  (And hopefully, the paper applying this method in a group study will be 
    available in the near future.)

###########################################################################

+ OVERVIEW

  We want to combine A) a group's set of *.grid files for a given network
  with B) other subject information of interest for statistical modeling.

  Step 0 (may not be necessary, depending on how uptodate 3dTrackID
  version is), using 'fat_mvm_gridconv.py':

       Read in old *.grid files,

       -> produce: new, fat_mvm_prep.py-readable ones (*_MOD.grid)
       
       If your 3dTrackID grid file does not have lots of information
       commented with '#' characters throughout it, providing names
       and numbers of parameters, as well as the list of ROIs, then
       this is an 'old school' file that needs updating.  If the first
       character of your grid file is a '#', then you don't need to
       perform this step.


  Step 1, using 'fat_mvm_prep.py':

       Read in *.csv file,

       Read in *.grid files,
            o find matrix elements (i.e., ROIs) that have nonzero values
              for *all* subjects-- currently, no missing data modeling
              possible.

       -> produce: 
                 - a data table (*_MVMtbl.txt) readable by 3dMVM, with one
                   row per subject and whose columns contain both CSV- and
                   *.grid-data;
                 - a log file (*_MVMprep.log) containing a recording of
                 - *.grid file and CSV subject matching, a list of matrix
                   element (i.e., ROI) locations found for further 
                   analysis, and a list of parameter names (i.e., which
                   matrices were in the *.grid files).
                   

  Step 2, using 'fat_mvm_scripter.py':

       Read in the list of ROIs for statistical modeling 
            - can be done by providing the *_MVMprep.log file or an explicit
              commandline list; these will be run in individual models-- 
              i.e., one for FA, one for MD, etc.),

       Read in the list of variables for the statisical modeling
            - can be done by providing simple text file (one variable
              per line) or an explicit list in the commandline.
              Variable names must match column names from original CSV
              file (now stored in *_MVMtbl.txt file).  (The
              interaction of variables is not currently parsed for in
              the Python script, but watch this space...  NB:
              interaction of variables *is* allowed in the '-bsVars'
              entries of 3dMVM, so in the meantime you can edit the
              created script to include this; see '3dMVM -help' for
              more description.)

       Read in name of table file (*_MVMtbl.txt),
       -> produce: a script for running 3dMVM. **This is a basic
                   starter script which you can modify if you want.**
                   If you have P parameters (i.e., matrices in a *grid
                   file), N variables selected to model from your CSV
                   data and R ROIs per matrix, then you are will be
                   set up to run P multivariate tests with N+1 effects
                   (plus-one because of the intercept) and at least
                   P*N*R post hoc tests (for each parameter, there is
                   a set of N*R posthoc tests-- if any variables are
                   categorical, then there are posthoc tests run for
                   each category, hence the 'at least', above).
                   
       The idea with the tests is to investigate first whether there
       are any significant relationships between your *whole network*
       of parameters (i.e., the repeated measures) and any of your
       variables.  You can then followup within a network that show
       significant association, using the posthoc tests to see what
       regions appear to be the strongest drivers of the relation.


   Step 3, using the created script: 
       Run the script (and feel free to edit it more beforehand if you
       want-- these tools are just to help get you started, and to
       help you with making table file easily).

########################################################################

DEMO EXAMPLE SET

For any of the three programs (fat_mvm*.py) discussed here, please use
the humble '-h' option to see the help description for more
information.

The data in this FAT_MVM_DEMO/ file is simply:
    + CSV file of 20 subjects' data:     all_subj_info.csv
    + group of 20 *.grid files:          SUBJ_TR_GRID/*.grid

1) Combine the *.grid files and the CSV file of subject data. The
   matching of *.grid file to CSV file 'Subj ID' is done by assuming
   that the 'Subj ID' is somewhere contained explicitly in the path to
   each *.grid file.  If this is not the case, you can provide a two
   column list matching 'Subj ID' to *.grid file (see 'fat_mvm_prep.py
   -h' for more info):

   $ fat_mvm_prep.py -p DEMO -c all_subj_info.csv     \
                     -m 'SUBJ_TR_GRID/*_000_MOD.grid'

   Check and see how the produced DEMO_MVMprep.log and DEMO_MVMtbl.txt
   files look. For the table file: each row contains data for one
   subject; the first column is 'Subj' and contains the ID; the next
   columns contain all the data from the CSV file; then comes the
   'ROI' column, with labels 'A_B' based on the tracked pair of
   targets, A and B; the penultimate column, 'matrPar' has the matrix
   parameter names; and final, the 'Ausgang_val' (output value) from
   3dTrackID for that ROI and parameter.  
   
   The log file contains some commented information, as well as the
   matching that fat_mvm_prep.py determined for the *.grid file names
   and the CSV 'Subj' IDs. *Check the visually*: if they are not
   correct, then you will likely have to provide the matching
   explicitly (see 'fat_mvm_prep.py -h' for help). Hint: your input
   for that would look pretty much like this part of the log file.

   The ROI_list contains the list of matrix elements which had no
   missing values across all subjects.  For tracking, this is
   determined by matrix locations in the 'NT' (=number of tracks)
   matrix with value >0.  For functional matrices, all elements are
   taken.  If you want a (further) subset of ROIs later, you can
   specify it in 'fat_mvm_scripter.py'.

   The Parameter_list shows the labels of matrices that were found.
   By default, each will be used in its own model when using 3dMVM; of
   course, you don't have to look at or use all of them.


2) Set up a script for 3dMVM.  Now we put all the pieces together.  As
   an example of a model, we can pick a few of the available variable
   names from the column headings.  

   $  fat_mvm_scripter.py --vars='Group TEST1 age sex' \
                          --log_file=DEMO_MVMprep.log  \
                          --table=DEMO_MVMtbl.txt      \
                          --prefix=DEMO

   You can see the script which has been built, called
   'DEMO_scri.tcsh'.  You are welcome to modify this if you want.  Due
   to the combinatorial nature of how the posthoc tests are done by
   default, you may want to not include them in the script (see
   'fat_mvm_scripter.py -h' for how).  But let's assume you're happy
   with them for the moment. 

   See how the categorical and quantitative variables are treated a
   bit differently-- the nature of the variable is determined
   internally by whether a column contains all numerical characters
   (-> quantitative) or not (-> categorical).  So, please be kind in
   how you make your spreadsheets, i.e., 'M and F', not '1 and 2'.
   The gltLabels are made as follows: for quantitative variables,
   'ROI-var' shows which ROI 'A_B' is paired with which variable (easy
   enough); for categorical ones, 'ROI-var^cat' has the additional
   information of which category is being modeled hinged on.

3) You can run the script now:

   $  tcsh DEMO_scri.tcsh

   ... and see the statistical fruits of your labor.  For example, you
   can test your hypotheses for which DTI parameters would be
   significantly related to some other variable/test of interest.

   When you run the script, you may at times see warnings-- in
   general, these are ok to see, and they should not result in
   gnashing of teeth and wailing.  They are there for diagnostic
   purposes and the everpopular 'just in case' they are useful.