```
HOW-TO #5
A Cornucopia of Statistical Goodies:
Understanding t-tests, ANOVA, and Calculating Percent Change

Analysis of variance, or ANOVA, is a statistical procedure commonly used by
researchers from a wide array of disciplines, including psychology and
neuroscience.  ANOVAs are often used in the FMRI community to make comparisons
of signal activation across subjects.  For this reason, we provide a brief
introduction to ANOVA in this how-to.  Be warned, however, that thick volumes
have been written on analysis of variance, and the information provided in this
how-to can in no way duplicate those efforts.  This how-to will simply provide a
very general overview of ANOVA (no ugly equations, I promise!).  In addition,
there will be a brief discussion regarding independent- versus paired-samples
t-tests. Finally, we will conclude with a discussion on calculating percent
change and why this normalization method is important when running ANOVA.

-----------------

I.   Independent Samples t-test
II.  Paired Samples t-test
III. One-way Analysis of variance (ANOVA)
IV.  Two-way ANOVA
V.   Parameter Normalization - Calculating Percent Change

I. t-test for INDEPENDENT SAMPLES
------------------------------

Instead of going directly into an explanation of ANOVA, it is perhaps more
beneficial to begin with ANOVA's younger sibling, the t-test.  Since ANOVA is
based on the principles of t-tests, a review of the t-test will hopefully make
ANOVA more comprehensible.

When exactly should a t-test be used?  When making simple, straightforward
comparisons between two independent samples, the independent samples t-test is
usually the statistic of choice.  For clarity, we begin with an example that
implements the independent samples t-test.

EXAMPLE of Independent Samples t-test:
-------------------------------------

Suppose a standardized vocabulary test is given to a group of students.
his sample consists of 120 high school seniors, with an equal number of
males and females.  In other words, our study consists of two independent
samples: 60 males and 60 females who are high school seniors.  We are
curious to know if there are any significant sex differences in performance
on the standardized vocabulary test. What statistical procedure can be used
to best address this question?  The Independent samples t-test.

An independent samples t-test is used to compare the means of two different
samples.  In this example, our two samples are MALES and FEMALES, which
make up our single independent variable, "sex."

An independent samples t-test is appropriate when there is one independent
variable (IV) with two levels.  For instance, a comparison of young and
older adults' performance on a memory test could be examined using an
independent samples t-test.  This type of t-test could also be used to
examine if there is significantly less sneezing in patients administered
Allergy Drug A versus other patients who are given a placebo.  In general,
if a study consists of two samples that share a variable of interest, and
group membership does not overlap, then a 	comparison on some dependent
variable can be made between these two groups.

The first thing to do is clearly label our independent and dependent
variables (also known as "factors").  Recall that the independent variable
(IV) is the factor being manipulated by the experimenter, whereas the
dependent variable (DV) is the factor being measured (i.e., the result of
interest).  In this case, the IV is 'sex' with two levels: male and female
(Note: 'sex' or 'gender' is sometimes called a quasi-independent variable
because the experimenter cannot assign a sex to 	each subject.  The
subject already comes to the experiment as either 	male or female).  The
dependent variable in this example is 'vocabulary test score'.

A data spreadsheet for this example may look like this:

SEX	ID #	VOCAB. SCORE (max=100)
---	----	----------------------
F	001	    	85
F	002  		96
F	003  		72
.	 .	 	.
.	 .		.
F	060		64
M	001		56
M	002		100
M	003		82
.	 .		.
.	 .		.
M	060		45

The question of interest is whether the mean test score for the females in
our sample will differ significantly from the mean test score of the males
in our sample.  To answer this question, an independent samples t-test must
be performed.

The output from the t-test can be presented in a table and a figure,
as shown below:

Descriptive Statistics:
DV	  IV	     N	  Mean	   SD	  SE Mean
--	  --	    ---	  ----	   --	  -------
Vocab	  Males	    60	  75.6	  10.6	   1.37
Females   60	  76.4	  13.9	   1.79

Independent Samples t-test:
t-value	df	p-value
-------	--	-------
1.22		118	 .224

The results suggest that our DV "vocabulary test score" did not differ
significantly between the two levels of our IV "sex."  That is, males and
females did not differ significantly in their performance on the vocabulary
test (t=1.22, p>.05).  The results are often made clearer if a figure is
drawn.  In fact, it is probably good practice to plot out the results to
gain a better understanding of the results, especially when examining main
effects and interactions in ANOVA.  As my old stats teacher used to say,
"If in doubt, plot it out."

With a basic understanding of the Independent samples t-test, we can quickly
cover the next topic: Paired Samples t-tests.

II. t-test for PAIRED SAMPLES
------------------------------

The paired samples t-test is usually based on groups of individuals who
experience both conditions of the variable of interest.  For instance, one
study might examine the effects of Drug A versus Drug B on a single sample of
100 diabetics.  Subjects in this sample would receive Drug A one week, and
Drug B the next.  Another study might look at a sample of older adults and their
mean reaction times to two types of stimulus conditions: incongruent color-word
pairs (e.g., the word "blue" printed in red ink) and congruent color-word pairs
(e.g., the word "blue" printed in blue ink).  In each of these cases,

EXAMPLE of Paired Samples t-test:
--------------------------------

Suppose our group of 120 high-school seniors was given two different
vocabulary quizzes.  The question of interest in this case is whether test
performance varies significantly, depending on the quiz given.  In other
words, do participants perform similarly or significantly different between
the two quizzes?  In this example, our independent variables or factors of
interest are Quiz 1 and Quiz 2.  Our dependent measure will be the score or
performance for each quiz.  All 120 subjects will be administered these two
quizzes, and the paired samples t-test will be used to determine if
participants' test scores differ significantly between the two quizzes.

A data spreadsheet for this example may look like this:

SUBJECT ID #	QUIZ 1 SCORE	QUIZ 2 SCORE
------------	------------	------------
001	    	     66		     72
002  	     71		     80
003  	     80		     83
.	   	     .	 	     .
.	   	     .		     .
118		     64		     70
119		     56		     66
120		     32		     30

The question of interest is whether the mean score for Quiz 1 differs
significantly from the mean score for Quiz 2.  To answer this question, a
paired samples t-test must be performed.

The output from the t-test may look something like this:

Descriptive Statistics:
Variables    N     Correlation     p    Mean    SD     SE
---------   ---    -----------    ---	 -----   ---    ---
Quiz 1      120	.673	   .000	  69.2   8.4    0.77
Quiz 2				  79.0   6.2    0.57

Paired Samples t-test:
t-value		df	p-value
-------		--	-------
-2.87		119	 .005

The paired-sample analysis indicates that for the 120 participants, the
mean score on the second quiz (M=79.0) was significantly higher (p=.005)
than the mean score on the first quiz (M=69.2).  Figure 2 clearly shows
that on average, subjects performed higher on Quiz 2 than on Quiz 1.  These
results also indicate that a significant positive correlation exists
between these two variables (r=.673, p<.001), suggesting that students who
score high on one of the quizzes tend to score high on the other.
Likewise, students who score low on one quiz also tend to also score low on
the other.

With the basics of t-tests covered, it is now time to proceed to the ever-
elusive Analysis of Variance (ANOVA).

III. ONE-WAY ANALYSIS OF VARIANCE (ANOVA)
------------------------------------

Analysis of variance (ANOVA) is a statistical procedure used for comparing
sample means.  The results of the ANOVA can be used to infer that the means of
the corresponding population distributions also differ.  Whereas t-tests compare
only two sample distributions, ANOVA is capable of comparing many.

As a study becomes increasingly complicated, more complex statistical measures
must also be used.  In the examples above, a comparison of two means was made,
making the t-test a sufficient statistical tool.  However, if a study involves
one independent variable with more than just two levels or means, then a one-way
ANOVA will be an appropriate alternative to the t-test.

EXAMPLE of One-Way ANOVA:
------------------------

Suppose we were conducting a study on high school students' opinions
regarding the cafeteria food, and asked a sample of 100 students to rate
the quality of the cafeteria food on a scale of 1 to 10 (1 = "I'd rather
eat dog food", 10 = "Pretty darn good").  Our main objective for this study
is to determine if there is a difference in the rating of cafeteria food,
depending on a student's grade level.  As such, our independent variable or
factor would be GRADE LEVEL, consisting of 4 levels: 9th, 10th, 11th, and
12th grades.  Our dependent variable would be CAFETERIA FOOD RATING.

One may ask, could multiple independent samples t-tests be done to examine
the differences in cafeteria food ratings between class grades?  That is,
couldn't we just do several independent t-tests rather than a single one-
way ANOVA?  For example:

Comparison of cafeteria food ratings for:

The answer is both 'yes' and 'no'.  In theory, it IS possible to do
multiple t-tests instead of a single ANOVA, but in practice, it may not be
the smartest strategy.  ANOVA offers numerous advantages that t-tests
cannot provide.  Specifically, ANOVA is a very robust measure that allows
the user to avoid the dreaded "inflated alpha" that may arise from
conducting several t-tests.

Side Note - Watch out for Inflated Alpha!
-----------------------------------------

What exactly is inflated alpha?  Suppose we run several t-tests and set our
p-value to less than .05.  A p<.05 tells us that the probability of getting
statistically significant results simply by chance is less than 5%.
However - and this is a big "however" - conducting multiple t-tests may
lead to what is known as an "inflated alpha."  This is when the null
hypothesis is erroneously rejected.  In this example, the null hypothesis
is that no significant differences exist in food rating between class
grades.  Erroneous rejection of the null hypothesis is also known as a
Type 1 error or a false positive.

As more independent t-tests are done, the probability of getting a Type 1
error increases.  In our example, there are only six possible t-tests that
can be done, so alpha inflation is not a major problem.  However, suppose
one was testing the effectiveness of ten different types of allergy drugs.
The number of t-test comparisons that can be done in this sample would be
45!  See below:

Drug 1 vs. Drug 2	Drug 2 vs. Drug 9	Drug 5 vs. Drug 6
Drug 1 vs. Drug 3	Drug 2 vs. Drug 10	Drug 5 vs. Drug 7
Drug 1 vs. Drug 4	Drug 3 vs. Drug 4	Drug 5 vs. Drug 8
Drug 1 vs. Drug 5	Drug 3 vs. Drug 5	Drug 5 vs. Drug 9
Drug 1 vs. Drug 6	Drug 3 vs. Drug 6	Drug 5 vs. Drug 10
Drug 1 vs. Drug 7	Drug 3 vs. Drug 7	Drug 6 vs. Drug 7
Drug 1 vs. Drug 8	Drug 3 vs. Drug 8 	Drug 6 vs. Drug 8
Drug 1 vs. Drug 9	Drug 3 vs. Drug 9	Drug 6 vs. Drug 9
Drug 1 vs. Drug 10	Drug 3 vs. Drug 10	Drug 6 vs. Drug 10
Drug 2 vs. Drug 3	Drug 4 vs. Drug 5	Drug 7 vs. Drug 8
Drug 2 vs. Drug 4	Drug 4 vs. Drug 6	Drug 7 vs. Drug 9
Drug 2 vs. Drug 5	Drug 4 vs. Drug 7	Drug 7 vs. Drug 10
Drug 2 vs. Drug 6	Drug 4 vs. Drug 8	Drug 8 vs. Drug 9
Drug 2 vs. Drug 7	Drug 4 vs. Drug 9	Drug 8 vs. Drug 10
Drug 2 vs. Drug 8	Drug 4 vs. Drug 10	Drug 9 vs. Drug 10

The above example illustrates a study that might be in danger of alpha
inflation if multiple t-tests were conducted rather than a single one-way
ANOVA.  Of course, if one insists on running multiple t-tests, then the
best way to avoid alpha inflation is to adjust the p-value to a more
conservative level (e.g., A p-value of .05 divided by 45 comparisons
results in a p=.001).  Even with this correction, running this many t-tests
is a bit cumbersome (and will inevitably send any editor reading the
Results section of your manuscript into a tizzy).  It is probably better to
run the ANOVA, determine if any differences exist in the first place, and
then run specific t-tests to determine where those differences lie.

Enough about Inflated Alpha.  Back to our example...

A spreadsheet for our study on high school students' rating of
cafeteria food might look like this:

GRADE    ID #	  FOOD RATING (range 1-10)
-----   ----	  -----------
9	   001	     10
9	   002        5
.	    .	      .
.	    .	      .
10	   101	      8
10	   102	      3
.	    .	      .
.	    .	      .
11	   201	      5
11	   202	      9
.	    .	      .
.	    .	      .
12	   301	      7
12	   302	      9
.	    .	      .
.	    .	      .

A one-way ANOVA can be conducted on this data, with "cafeteria food rating"
as our dependent measure, and "grade" (with four levels) as our independent
measure. The mean food ratings for each class grade are shown in the output
and figure below:

Descriptive Statistics:
IV levels    N   Mean Food Rating	 SD     SE     Min   Max
---------   ---   ----------------	 --     --     ---   ---
9th grade   20	8.35		1.53   0.34     6    10
10th grade   24	7.75		2.13   0.44     4    10
11th grade   45	8.04		2.26   0.34     2    10
12th grade   11	6.27		3.32   1.00     2    10
-----------------------------------------------------------------
TOTAL  100	7.84		2.29   0.23     2    10

One-way ANOVA:
DV: Food Rating
Sum Squares  df    Mean Square     F      p
-----------  --    -----------    ---    ---
Between Groups      34.297     3       11.432     2.272   .085
Within  Groups     483.143    96        5.033
---------------------------------
TOTAL        517.440    99

The F-score and p-value of a one-way ANOVA will indicate whether the main
effect of the independent variable "class grade" was significant.  In other
words, a significant F-statistic would tell us that class grade had a
significant effect on cafeteria food rating.  In this example, our p-value
of .085 suggests that a marginally significant difference exists within
comparisons of food ratings among our four grade levels.

Although the ANOVA results tell us that grade-related differences exist in
cafeteria food ratings, it does not tell us 'where' those differences lie.
Is it between 9th and 12th graders?  10th and 11th?  Additional post-hoc
tests must be done to address this issue.  The results of the post-hoc
tests are shown below:

Post Hoc Tests (note: these are independent t-tests):
-------	   -------    ----------    ----    -------
9th      vs.    10th         .60	    .679     .379
9th      vs.    11th         .31	    .603     .613
9th      vs.    12th        2.08*	    .842     .015

10th      vs.    11th        -.29	    .567      .605
10th      vs.    12th        1.48	    .817      .074

11th      vs.    12th        1.77*	    .755      .021

The asterisks (*) indicate there are two pairs of groups whose means differ
significantly (p<.05) from each other.  12th graders seem to dislike the
cafeteria food significantly more than both 9th and 11th graders (p=.015
and .021 respectively).  Seniors also dislike the cafeteria food marginally

Why did the overall ANOVA show only a marginally significant difference
(p=.085), while the pair-wise comparisons yielded two differences (i.e.,
9th vs. 12th and 11th vs. 12th) that were strongly significant?  This is
because the overall ANOVA compares all values simultaneously, thus
weakening statistical power.  The post-hoc tests are simply a series of
independent t-tests.

IV. TWO-WAY ANALYSIS OF VARIANCE
----------------------------

A two-way (or two-factor) ANOVA is a procedure that designates a single
dependent variable and utilizes exactly two independent variables to gain an
understanding of how the IV's influence the DV.

WARNING:
When conducting ANOVA, it is relatively easy to get certain software packages
(like SPSS, SAS, SPM, AFNI, etc.) to conduct a two-way, three-way, even four-
or five-way analysis of variance.  The programs may do all the arithmetic, which
results in impressive-looking output, but it is up to the user to properly
interpret those results.  The ease of calculating analysis of variance on the
computer often masks the fact that a successful study requires many hours of
careful planning.  In addition, a one-way ANOVA is fairly straightforward and
easy to interpret, but a two-way ANOVA requires some training and frequently
involves a thorough examination of tables and figures before interpretation is
clear.  Understanding a three-way ANOVA usually requires an experienced
researcher, and interpretation of a four-way ANOVA is often nightmarish in
nature, even for the most skilled researcher.

EXAMPLE of Two-Way ANOVA:
-------------------------

Suppose we were interested in determining if performance on a math test
would differ between males and females given chocolate, vanilla, or
strawberry ice cream before taking the test (yes, this is a silly example
but just go with it).  In this example, we have two independent variables
and one dependent variable (side note: more than one DV would require a
MANOVA).  As such, the design of our experiment is set up in the following
manner:

IV1: SEX (Females, Males)
IV2: FLAVOR (Chocolate, Vanilla, Strawberry)

DV: MATH TEST SCORE

In this example, we have a 2 x 3 between-subjects design. The design is
labeled "between-subjects" because participants fall under only one level
of each independent variable.  For instance, subjects are either male or
female, and they are either given chocolate, vanilla, or strawberry ice
cream.  There is no overlap of our subject groups.  As such, our data

SEX	 FLAVOR		ID#  	MATH TEST SCORE
---	 ------		---  	---------------
F	chocolate	001	      104
F	chocolate	002	       89
.	  .	 	 .	       .
.	  .	 	 .	       .
F	vanilla		101	       98
F	vanilla		102	      100
.	  .	 	 .	       .
.	  .	 	 .	       .
F	strawberry	201	       80
F	strawberry	202	       77
.	  .	 	 .	       .
.	  .	 	 .	       .
M	chocolate	301	       98
M	chocolate	302	      106
.	  .	 	 .	       .
.	  .	 	 .	       .
M	vanilla		401	       78
M	vanilla		402	       70
.	  .		 .	       .
.	  .	 	 .	       .
M	strawberry	501	       88
M	strawberry	502	      102
.	  .		 .	       .
.	  .	 	 .	       .

By running a two-way ANOVA on the data, we hope to answer the following
questions: First, is there a main effect of SEX?  That is, do males and
females differ significantly on their performance on the math test?
Second, is there a main effect of FLAVOR, where a significant difference
in math performance can be found between chocolate, vanilla, and strawberry
ice cream eaters?  Finally, is there a significant SEX by FLAVOR
interaction?  That is, how do sex and flavor interact in their effect on
math test performance? Can any sex differences in math performance be found
between the three different ice cream flavors?

The output of our two-way ANOVA is shown below:

Descriptive Statistics:
DV = Math Score
Flavor      Sex	 Mean Score  	Std. Dev   N
------      ---	 ----------	--------   --
Chocolate    Female	   103.95	 18.14	   20
Male	   106.85	 13.01	   13
TOTAL	   105.09	 16.15	   33

Vanilla      Female	  100.00	 12.31	   26
Male	   98.46	 11.82	   13
TOTAL	   99.49	 12.01	   33

Strawberry   Female	  102.83	 10.68	   18
Male	   90.73	 21.23	   15
TOTAL	   97.33	 17.18	   33
-----------------------------------------------------------
TOTAL    Female	  102.03	 13.90	   64
Male	   98.29	 17.20	   41
TOTAL	  100.57	 15.30	  105

Two-way ANOVA:
Source         Sum Squares  df   Mean Square       F        p
------         -----------  ---  ------------     ---      ---
Corrected         2350.41     5      470.08        2.12    .070
Model

Intercept       996892.70     1   996892.70     4487.39    .000

FLAVOR	           1262.73     2      631.36        2.84    .063

SEX	            316.56     1      316.56        1.43    .235

FLAVOR * SEX       960.44     2      480.22        2.16    .121

Error	          21993.31    99      222.16
Total	        1086378.00   105
Corrected Err.   24343.71   104

Main Effect of FLAVOR:
---------------------
The above results indicate that our main effect of FLAVOR was
marginally significant (F(2,99)=2.84, p=.063), suggesting that math
scores differed somewhat between the three flavor groups.  To clarify
this main effect further, it is always helpful to plot out the
results.  Figure 4 shows that collapsed across SEX, chocolate tasters
scored marginally higher on the math test than strawberry and vanilla
tasters.  Vanilla and strawberry tasters did not seem to differ on
their math test performance.

Main Effect of SEX:
------------------
Our main effect of SEX was not statistically significant
(F(1,99)=1.43, p=.235), indicating no sex-related differences in math
test performance.  Again,this result can be seen clearly by collapsing
across FLAVOR, and plotting the mean math scores for males and females
(see Figure 5):

SEX by FLAVOR Interaction:
-------------------------
Finally, there was no significant SEX by FLAVOR interaction
(F(2,99)=2.16, p=.121).  By plotting out the results (see Figure 6),
we clearly see that male and female chocolate tasters performed
virtually identically on the math test.  The same result can be seen
between male and female vanilla tasters.  However, amongst the
strawberry tasters, we see a pattern (although not statistically
significant) of females performing slightly higher on the math test
than males.

As this example illustrates, it is always a good idea to plot out the
results.  A schematic view of the data makes it much easier to decipher
where the significant main effects and interactions may lie.  Even if the
results are not statistically significant, a plot will show the pattern of
results, which can be just as informative as the statistical findings.

V. PARAMETER NORMALIZATION - Calculating Percent Change
----------------------------------------------------

Although this section may seem to deviate a bit from the previous sections on
t-tests and ANOVAs, the discussion of calculating percent change is actually
quite relevant to statistical testing.  Primarily, the calculation of percent
signal change is a type of normalization of the data, which is necessary when
comparing groups whose baselines differ.  When making comparisons across
subjects or groups of subjects, normalization of the data becomes an important
issue.  If certain parameters of interest are not normalized before group
comparisons are made, then the results may be misleading, suggesting there are
differences between groups when in fact there are not.  To fully understand this
issue, an example is in order:

EXAMPLE of Percent Change:
-------------------------

Suppose we tested a sample of young adults and a sample of older adults on
a reaction time task.  The reaction time task has two conditions: the first
one involves color blocks presented on the computer screen.  Each
participant must identify the color the block is painted in as quickly as
possible.  The second condition consists of color words (e.g., "BLUE")
printed in an incongruent ink color (e.g., red).  The goal is to ignore the
word and name the color ink.  This second condition is often referred to as
because participants must first suppress the urge to read the word, and
instead, name the ink color.  As such, the Stroop condition should result
in significantly longer response times than the color blocks condition for
both age groups.  Our 2 x 2 mixed factorial design is composed of the
following independent and dependent variables:

IV1-AGE:		1. Young
2. Older

IV2-CONDITION:		1. Color blocks

DV - RESPONSE TIME  (measured in msec)

In this case, AGE is a between-subjects variable because subjects are
either young or older, but not both.  Alternatively, CONDITION is a within
subjects variable because participants are exposed to both the color blocks
and the Stroop task.  As such, our design is identified as "mixed" because
there are both between- and within-subject variables in our experiment.

The following hypotheses can be formulated for this experiment:

First, do we expect to find a main effect of AGE?  If so, we might
predict that with age, reaction time declines, which results in
significantly slower response times for older adults than young

Second, do we expect to find a main effect of CONDITION?  We predict
that the Stroop condition will result in significantly slower response
times for both age groups than the colored blocks condition.  This is
because reading is more automatic than color naming.  As such,
subjects must take the time to suppress the reading response in order
to correctly name the ink color.

Finally - and most importantly - do we expect to find a significant
AGE x CONDITION interaction?  In other words, do older adults show
more slowing in the Stroop condition relative to the color blocks
condition than do young adults?  If so, we might conclude that with
age, not only does response time increase, it also becomes more
difficult to suppress irrelevant information, making the Stroop task
more difficult for older than young adults.  One reason for such a
finding may be that the ability to block out irrelevant information
becomes compromised with age.

The output for this study may look something like this:

Descriptive Statistics:
DV = Mean Response time (in milliseconds)

------------	 -------------
A) Color Blocks	    500 ms	     1000 ms

B) Stroop Task	    550 ms	     1100 ms
-------------------------------------------------------
Difference (B-A):      50 ms	      100 ms

By examining the absolute differences between the Stroop task and the Color
Blocks, we find a difference of 50 ms for young adults and 100 ms for older
twice as slow as young adults when they are presented with the Stroop task.
It is twice as hard for older adults to deal with the Stroop task, which
involves the simultaneous presentation of two pieces of incongruent
information.  This must mean that with age, older adults develop inhibitory
deficits that deter them from suppressing irrelevant information."

However, the interpretation of these results may be inaccurate, because it
is not accounting for the fact that the baselines for young and older
adults are different.  If the color blocks condition is our
baseline/neutral condition, we see that on average, young adults respond at
around 500 milliseconds.  On the other hand, older adults' baseline is much
slower, at 1000 milliseconds.  This finding indicates that in general,
older adults have slower response times than young adults.  As such, we
must account for these differences in baseline, by calculating the percent
change in response time between the baseline condition and the Stroop
condition.  That is, how much slower is each group in the Stroop condition
relative to their baseline?  Is it 5 percent for young and 10 percent for
older adults? Is it 10 percent for young and 12 percent for older?
Converting our absolute response times to percent change will allow us to

One way to calculate the percent change is by using the following
simple equation:

A = Baseline Condition (e.g., color blocks response time)
B = Stimulus Condition (e.g., Stroop task response time)

((B - A)/A) * 100 = percent change

We can apply this formula to each of our age groups:

((550 - 500)/500) * 100 = 10%

((1100 - 1000)/1000) * 100 = 10%

When the data are normalized, we realize that there really aren't any age-
related differences in Stroop interference between young and older adults.
For both age groups, the Stroop condition slowed participants by 10%
compared to the baseline condition.  Although older adults are slower in
general than young adults, both age groups experience the same amount of
interference from the Stroop condition.  Therefore, these results do not
suggest that inhibitory deficits increase with age.  As such, the
AGE x CONDITION interaction should not be statistically significant.  A
plot of the interaction (Figure 7) verifies this conclusion:

FMRI and Calculating Percent Signal Change:
------------------------------------------

The above example showed the importance of normalization of data in a
psychological study.  This normalization process is just as important in FMRI
research.  When comparing parameters that quantify activation across subjects
(i.e., an ANOVA will be run to examine activation levels across subjects in
response to a stimulus), these parameters should be normalized by calculating
the percent signal change.  This is because FMRI signal amplitude varies for
different subjects, runs, scanning sessions, regressors, image reconstruction
software, and modeling strategies.  Amplitude measures (i.e., regression
coefficients) can be turned to percent signal change from the baseline.  For an
FMRI example of how the percent signal change is calculated using AFNI, see the
AFNI_howto section of this tutorial.

CONCLUSION
----------

The goal of this review was to take some anguish and anxiety out of statistics.
Since we cannot take the statistics out of research, we must conquer and embrace
it.  Hopefully, any questions you may have had regarding t-tests, ANOVAs, and
shown here come from behavioral research (I'm a psychologist, this is what I
know), the statistics can be applied to FMRI research as well, on a voxel-by-
voxel basis.  Although the BOLD signal is a somewhat more complicated dependent
variable than say, a reaction time or a test score, the statistics are the same.

REFERENCES
----------

If this review has whetted your appetite for more statistics, the following
references selected by members of the Scientific and Statistical Computing Core
at the NIMH (http://afni.nimh.nih.gov/sscc) may be of interest to
you.  Good luck and happy reading!

Picks by Dr. Cox:
----------------
Bickel, P. J. & Doksum, K. A. (1977). Mathematical Statistics. Holden-Day.

Casella, G. & Berger, R. L. (1990). Statistical Inference. Brooks/Cole.

Picks by Dr. Chen:
-----------------
Kutner, M.H., Nachtschiem, C.J., Wasserman, W., & Neter, J. (1996). Applied
Linear Statistical Models, 4th edition. McGraw-Hill/Irwin.

Neter, J., Wasserman, W., & Kutner, M.H. (1990). Applied Linear Statistical
Models: Regression, Analysis of Variance, and Experimental Designs,
3rd edition. Richard d Irwin Publishing.

Picks by Dr. Christidis:
-----------------------
Howell, D. C. (2003). Fundamental Statistics for the Behavioral Sciences,
5th edition. International Thomson Publishing.

Hays, W. L. (1994). Statistics, 5th edition. International Thomson Publishing.