Well to me it's more of an issue of maintainability. BEET mode is more akin to transport/tunnel mode than AH/ESP/IPcomp. As such its implementation would be most at home where the existing encapsulation and decapsulation for transport/tunnel mode is done. That is, in xfrm[46]_input.c and xfrm[46]_output.c.
For instance, the reason the current patch has to touch esp4.c at all is really because the patch to xfrm4_output.c isn't right. It should do what the comment says and set skb->h to the start of the payload, not the start of the ESP header. If it did that, then esp_output doesn't have to care about BEET at all.
Also, the outer header generation should be done before x->type->output is called, not after. That way, the AH semantics falls out quite naturally.
--Diego