This looks mostly pretty reasonable. I will add a few comments about some of the steps.
1. Averaging is everyone's, including mine, first instinct as a base. This works well for very well-behaved subjects, but it's not terribly robust to motion. We generally will recommend a minimum outlier volume or sub-bricks 0 or 1 for difficult cases. A reliable individual sub-bricks generally works better, but if you have a very still subject, then averaging is fine.
2. We do have B0 correction software now - epi_b0_correct.py.
4. Nonlinear warp of EPI to anat. So I think this is a good idea, and one that has been on my list of things to try for quite some time in combination particularly with the similar blip-up/down correction. DR.BUDDI in the Tortoise package also does this. This would be limited to a single, specified EPI sub-brick volume and using a limited range of distortion, as you said.
5. Concatenating is possible with 3dNwarpApply. See unWarpEPI.py for an example of how something similar was done with blip-up/down. Going to standard space adds more transformations.
Adding -nopenalty or specifying a small penalty with -penfac with 3dQwarp will allow more distortion, but it might not be enough. I'm surprised you would get that much distortion between EPI and anat though; I would expect only relatively small amounts left, especially after the B0 correction. The initial levels will allow larger displacements. Include -workhard too. Separating the analysis into two sections - cerebrum, and then cerebellum and brainstem might work better.