[muscle] Re: Muscle on Solaris

  • From: Lior Okman <lior.okman@xxxxxxxxxxxxxxxxxxxxxxxx>
  • To: muscle@xxxxxxxxxxxxx
  • Date: Mon, 03 Jul 2006 12:41:27 +0300

Jeremy Friesner wrote:
> Hi Lior,
>
> A stack crawl of the crash would be helpful, especially if you can determine 
> the exact line at which the program crashed.  (if the debugger is being 
> unhelpful, I sometimes even resort to determining the crash location 'the 
> hard way', by putting in lots of fprintf(stderr, "got to line %i\n", 
> __LINE__) type of statements and seeing which one gets printed last before 
> the crash... MuscleSupport.h even defines a MCHECKPOINT macro for that 
> purpose).
>   

Here's the backtrace. Note that because this only happens when there is
a "-O" switch, the backtrace is a bit muddled.

(gdb) bt
#0  0x00069300 in muscle::PulseNode::PulseNode() (this=0xf840c) at
../util/PulseNode.cpp:8
#1  0x0006d8b0 in muscle::ServerComponent::ServerComponent()
(this=0xf83fc) at ../reflector/ServerComponent.cpp:10
#2  0x00054df0 in
muscle::AbstractReflectSession::AbstractReflectSession()
(this=0xff1405a8) at ../reflector/AbstractReflectSession.cpp:17
#3  0x00056108 in muscle::DumbReflectSession::DumbReflectSession()
(this=0xf83fc) at ../reflector/DumbReflectSession.cpp:18
#4  0x000568c4 in muscle::StorageReflectSession::StorageReflectSession()
(this=0xf83fc) at ../reflector/StorageReflectSession.cpp:62
#5  0x00056604 in
muscle::StorageReflectSessionFactory::CreateSession(muscle::String
const&) (this=0xffbff6b8) at ../reflector/StorageReflectSession.cpp:31
#6  0x0006af64 in
muscle::FilterSessionFactory::CreateSession(muscle::String const&)
(this=0xffbff598, clientHostIP=@0xffbff228) at ../util/RefCount.h:173
#7  0x0006203c in muscle::ReflectServer::DoAccept(unsigned short, int,
muscle::ReflectSessionFactory*) (this=0xffbff7a8, port=2960, acceptSocket=3,
    optFactory=0xffbff598) at ../util/RefCount.h:94
#8  0x0006191c in muscle::ReflectServer::ServerProcessLoop()
(this=0xffbff7a8) at ../reflector/ReflectServer.cpp:614
#9  0x000648e4 in std::moneypunct<char, false>::do_curr_symbol() const
() at muscled.cpp:273
#10 0x00065550 in std::numeric_limits<unsigned long long>::min_exponent ()

Line 8 in the PulseNode.cpp file is the constructor, and the only thing
that happens there is the member initializations.

> That said, the behaviour you describe reminds me of two previous issues I've 
> seen... whether your problem is related or not, I have no idea, but they 
> might provide clues:
>
> 1) I stumbled across a bug in gcc 3.x that would cause new (nothrow) to 
> return an invalid pointer (0x04, instead of 0x00) on memory failure when you 
> used it to try to allocate an array.  I put in a hack-around, as shown on 
> lines 289-303 of support/MuscleSupport.h, but perhaps the hack-around doesn't 
> work (or worse, is causing problems) under Solaris.
>   
I disabled this workaround and recompiled the server, and the problem
still happens. This workaround is not causing the problem.
> 2) On some CPUs (I believe SPARC is one of them), accessing a multibyte value 
> (e.g. int32 or float) on a non-word-aligned memory address will cause the CPU 
> to throw an exception.  At one point I went through muscle's 
> flatten/unflatten code to handle this problem:  the muscleCopyIn() and 
> muscleCopyOut() templated inline functions (also declared in 
> support/MuscleSupport.h) are implemented to call memcpy() to access unaligned 
> values if MUSCLE_CPU_REQUIRES_DATA_ALIGNMENT is #defined; otherwise they just 
> do a normal copy using the assignment operator.  So it is possible either 
> that MUSCLE_CPU_REQUIRES_DATA_ALIGNMENT is not being #defined on your system, 
> and should be (see line 69 of MuscleSupport.h), or perhaps some new code 
> snuck into the codebase that access unaligned values without using 
> muscleCopyIn()/muscleCopyOut(), and thus causes the crash (entirely possible, 
> since I don't test the code on SPARC CPUs, so I might have done that without 
> thinking about it, and
>   wouldn't see any symptoms under PPC or Intel)
>
>   
SPARC will crash on an access to a non-word-aligned memory access, but I
made sure that the MUSCLE_CPU_REQUIRES_DATA_ALIGNMENT is set, and the
issue still happens.

> Cheers,
> Jeremy
>
>   
Regards,
Lior


Other related posts: