[haiku-commits] Re: haiku: hrev44282 - src/system/kernel/arch/x86

  • From: Rene Gollent <anevilyak@xxxxxxxxx>
  • To: haiku-commits@xxxxxxxxxxxxx
  • Date: Mon, 2 Jul 2012 19:22:17 -0400

On Mon, Jul 2, 2012 at 5:59 PM, Ingo Weinhold <ingo_weinhold@xxxxxx> wrote:
> Ugh, I wasn't aware of that. Sometimes it helps to read the specs. :-)
>

I was more surprised that the assembler allowed it without complaining.

> Not that performance matters in this implementation anyway, but the cmp can 
> be omitted. dec already updates ZF.

Good to know, thanks, like I said x86 isn't one I've had much past
experience with.
> 4 instructions per byte is rather suboptimal (reordering movsb and cmpb might 
> be slightly better).
>

Fair enough on the reorder, but since the rep / mov variation is
apparently out of the question, I'm not really sure as to a more
compact way to write it. Thoughts?

> A problem I notice now (and which the previous implementation had, too) is 
> that the source memory location is accessed twice. If that is user memory it 
> could actually change in the meantime, thus possibly resulting in an 
> unterminated destination buffer. A safe (though even slower) implementation 
> would be:
>

Wouldn't that imply that the source program is doing something not
thread-safe? I didn't think strlcpy and friends guaranteed thread
safety, especially not concurrent modification of the string being
copied.

> That looks unnecessary. When loop falls through, ecx is always 0. When 
> jumping out of the loop ecx is never 0. So it could jump directly to 
> .L_user_strlcpy_source_done.
>

Noted, will deal with that and the dec.

> For performance/code density reasons clearing a register is usually done with 
> xor or sub and testing for 0 via test, both arguments being the register.
>

Thanks! Will revise shortly. While we're on the subject, do you think
it'd be worth reworking arch_cpu_user_memcpy() and
arch_cpu_user_memset() to operate a dword at a time for as much of the
data as possible and then copying the remaining 1-3 bytes if needed?

Regards,

Rene

Other related posts: