The disadvantage of your method 1 is quite fatal, disallowing you to test (Stimtype A correct – Stimtype B correct).
The weighting strategy in method 3 is a little problematic to me: any evidence on the proposition that BOLD signal is proportional to the rate of correct responses?
I don't see anything wrong by modeling with 6 separate conditions (method 2). In the case of (Stimtype A all – Stimtype B all), it is really a general linear test instead of a constrast (all coefficients adding up to 0), and the inequality of 1's and -1's in the test is not a problem at all. So rest assured because it is essentially a reduced model vs. full model if you peek into the details of option -glt in 3dDeconvolve.
Gang