Hi, Shankar-
Cool, glad that is working well.
I think with multiple threading, there is a diminishing returns after a while: doubling the number of threads doesn't halve the runtime, because there is 1) some part of the processing that is not parallelized, and 2) there is computational cost to multithreading (organizing jobs, separating and rejoining data, etc.). Eventually, adding more CPUs doesn't add more efficiency, because there is so much cost from the second phenomenon. It might be that things top out around 12 or so in this case (that actually rings a bell with what Bob has mentioned previously, I think). But, if running with 12 or 48 doesn't matter for you, you can run 4 job simultaneously, each with 12 threads-- so, "parallelizing" your analysis further in that sense still works.
--pt