Rene Gollent wrote: > On Mon, Jul 2, 2012 at 7:22 PM, Rene Gollent <anevilyak@xxxxxxxxx> wrote: > > Fair enough on the reorder, but since the rep / mov variation is > > apparently out of the question, I'm not really sure as to a more > > compact way to write it. Thoughts? > > Actually, reordering them currently isn't feasible either, since we're > relying on movsb to increment the source/destination pointers. As > such, if movsb doesn't come first, cmpb is looking at the byte that > precedes the beginning of the source string, so if that happens to be > 0 we don't copy data at all. What I was thinking of is: "cmpb $0, (%esi); movsb; je ...". Since movsb doesn't touch ZF, the (single-threaded) semantics would be the same, but since there's another instruction between the load and the jump, it might help parallelizing execution. But maybe not, since the movsb loads the same address anyway and today's processors probably crunch optimizations like this for breakfast. ;-) I'm afraid I'm really not up to date on optimization, so don't mind me. Anyway, I'd go with the safe version I proposed anyway, even if it is slower. CU, Ingo