gtkyd_check.py


This program is for Getting To Know Your Data (GTKYD). Provide a list of datasets, and this program will check their header (and possibly a few data) properties. Properties are checked with 3dinfo, nifti_tool and 3dBrickStat.

This program creates the following useful outputs:

  • A compiled spreadsheet-like table file, for reference, with 1 row per input dataset and one column per measured property. This is actually made using gen_ss_review_table.py. (name: OUT.xls)

  • For each item checked, there will also be a detailed report file (N lines of data for N input datasets) (name: OUT/rep_gtkyd_detail_*.dat)

  • For each item checked, there will be a “uniqueness” report file, which will have 1 line of data for each unique value present across all input datasets. So, if there is only 1 line of data, then that property is consistent across all dsets; otherwise, there is some variability in it. (name: OUT/rep_gtkyd_unique_*.dat)

  • For each input dataset, a colon-separated dictionary of basic properties. These can be further queried with gen_ss_review_table.py. (name: OUT/dset_*.txt)

ver = ${version} auth = PA Taylor (SSCC, NIMH, NIH, USA), but no doubt also including

the valuable insights of RC Reynolds and DR Glen

Overview

Usage

-infiles FILE1 [FILE2 FILE3 ...]
               :(req) name of one or more file to input

-outdir ODIR   :(req) name of output "report directory", for more the
                reports of details and uniqueness of each property.

-do_minmax     :include dataset min and max value info, which can be
                slow (uses '3dBrickStat -slow ...' to calculate it
                afresh)

-id_keeps_dirs N  :keep N directories (counting backward from the
                input filename) as part of the 'subject ID' field;
                default is to only keep the prefix_noext of the input
                filename (i.e., N=0). This can be useful if the paths
                encode useful information to identify subject infiles.

-overwrite     :overwrite any preexisting outdir and corresponding XLS
                file

-help, -h      :display program help file

-echo          :run very verbosely, by echoing each part of script
                before executing it

-hist          :display program history

-ver           :display program version number

-verb  VVV     :control verbosity (def: 1)

-show_valid_opts :show valid options for this program

Examples

1) Basic example, running on a set of EPI:
    gtkyd_check.py                                       \
        -infiles  group_study/sub*/func*/*task*.nii.gz   \
        -outdir   group_summary

2) Include (possibly slow) min/max info, and check anatomical dsets:
    gtkyd_check.py                                       \
        -infiles    group_study2/sub*/*T1w*.nii.gz       \
                    group_study2/sub*/*T1w*HEAD          \
        -do_minmax                                       \
        -outdir     group_summary2

... and any of these might be usefully followed up with
gen_ss_review_table.py (querying the dset*.txt files in the outdir),
to find subject datasets that have certain properties.  For example:

    gen_ss_review_table.py                               \
       -infiles group_summary/dset*txt                   \
       -report_outliers 'nv'     VARY                    \
       -report_outliers 'orient' VARY                    \
       -report_outliers 'ad3'    LT 3.0