On Mon, Jul 2, 2012 at 5:59 PM, Ingo Weinhold <ingo_weinhold@xxxxxx> wrote: > Ugh, I wasn't aware of that. Sometimes it helps to read the specs. :-) > I was more surprised that the assembler allowed it without complaining. > Not that performance matters in this implementation anyway, but the cmp can > be omitted. dec already updates ZF. Good to know, thanks, like I said x86 isn't one I've had much past experience with. > 4 instructions per byte is rather suboptimal (reordering movsb and cmpb might > be slightly better). > Fair enough on the reorder, but since the rep / mov variation is apparently out of the question, I'm not really sure as to a more compact way to write it. Thoughts? > A problem I notice now (and which the previous implementation had, too) is > that the source memory location is accessed twice. If that is user memory it > could actually change in the meantime, thus possibly resulting in an > unterminated destination buffer. A safe (though even slower) implementation > would be: > Wouldn't that imply that the source program is doing something not thread-safe? I didn't think strlcpy and friends guaranteed thread safety, especially not concurrent modification of the string being copied. > That looks unnecessary. When loop falls through, ecx is always 0. When > jumping out of the loop ecx is never 0. So it could jump directly to > .L_user_strlcpy_source_done. > Noted, will deal with that and the dec. > For performance/code density reasons clearing a register is usually done with > xor or sub and testing for 0 via test, both arguments being the register. > Thanks! Will revise shortly. While we're on the subject, do you think it'd be worth reworking arch_cpu_user_memcpy() and arch_cpu_user_memset() to operate a dword at a time for as much of the data as possible and then copying the remaining 1-3 bytes if needed? Regards, Rene