[rmaexpress_help] Re: RMAExpress still crashes after 1.0 beta 8

I'm not sure what you mean by it missing the ".cel" files and only
seeing the ".CEL", at least on my local build it has no problem showing
both (unless I am mis-understanding what you mean by "auto file
listing")

Ben


On Thu, 2008-03-06 at 14:32 -0800, Alex Feltus wrote:
> Ben:
> 
> I just wanted to do a final follow-up.  I found a few
> more corrupt .cel files and was able to RMA normalize
> 2852 arrays.  Woo-hoo!
> 
> Of course, a .cel check would be a great addition to
> your code as it works wonderfully when the data is
> sound.  One other thing I noticed is that your auto
> file listing in the GUI mode appears to be case
> sensitive (CAPS), so it misses .cel files.
> 
> Thanks for the tech support!
> 
> Alex
> 
> 
> --- Ben Bolstad <bmb@xxxxxxxxxxxxx> wrote:
> 
> > Well, really this should be more properly handled by
> > the parsing code
> > (ie my responsibility). But it depends on the
> > failure mode as to whether
> > you can detect it manually yourself. In the case of
> > the two files below,
> > the corruption is with the data stored in the file
> > itself rather than
> > something fairly easy to detect by just looking at
> > file sizes etc.
> > 
> > That said, one way to do it would be to use the Raw
> > Data Visualizer
> > option, and more specifically the "Individual
> > Density Plots" option.
> > Scrolling through these should show the corrupt
> > files (the plot will
> > most likely be non existent). Doing this at this end
> > shows these are all
> > potentially corrupt:
> > 
> > GSM128656.CEL
> > GSM260885.CEL
> > GSM133917.CEL
> > GSM133941.CEL
> > GSM133946.CEL
> > GSM133954.CEL
> > GSM133956.CEL
> > GSM133972.CEL
> > GSM133982.CEL
> > GSM133990.CEL
> > GSM134356.CEL
> > GSM134368.CEL
> > GSM134372.CEL
> > GSM134393.CEL
> > GSM134407.CEL
> > GSM134420.CEL
> > GSM134442.CEL
> > GSM134453.CEL
> > GSM134460.CEL
> > GSM134461.CEL
> > GSM142607.CEL
> > GSM142791.CEL
> > GSM157302.CEL
> > GSM157308.CEL
> > GSM157313.CEL
> > GSM157320.CEL
> > GSM183517.CEL
> > 
> > Although that seems to be a lot of files out of
> > approx 2800 it is not
> > that many (about 1% of files). I can't guarantee
> > that list is
> > comprehensive or that I did not accidentally mistype
> > one of the names
> > above. Also it is potentially possible there are a
> > different set of
> > corruptions in the data I have and you have.
> > 
> > 
> > Ben
> > 
> > 
> > On Wed, 2008-03-05 at 06:22 -0800, Alex Feltus
> > wrote:
> > > Is there any specificity to the corruption?  Can I
> > > prescreen files in some way? 
> > > 
> > > Alex
> > > 
> > > --- Ben Bolstad <bmb@xxxxxxxxxxxxx> wrote:
> > > 
> > > > I am guessing that there are further corrupted
> > CEL
> > > > files in that
> > > > dataset. I did not have the energy to further
> > > > examine that possibility
> > > > yesterday.
> > > > 
> > > > Ben
> > > > 
> > > > 
> > > > On Wed, 2008-03-05 at 06:03 -0800, Alex Feltus
> > > > wrote:
> > > > > Ben:
> > > > > 
> > > > > Thanks for your help with this.  I removed
> > those
> > > > two
> > > > > .CEL files which allowed me to get to the BA
> > stage
> > > > > (further than before).  However, I still get a
> > > > > segmentation fault.  
> > > > > 
> > > > > Alex
> > > > > 
> > > > > Here is the gdb output (appears to break as
> > > > before):
> > > > > 
> > > > > Starting program: /usr/local/bin/RMAExpress 
> > > > > [Thread debugging using libthread_db enabled]
> > > > > [New Thread 47990231476192 (LWP 8626)]
> > > > > [New Thread 1082132816 (LWP 24051)]
> > > > > [New Thread 1090525520 (LWP 24052)]
> > > > > [Thread 1090525520 (LWP 24052) exited]
> > > > > [Thread 1082132816 (LWP 24051) exited]
> > > > > [New Thread 1082132816 (LWP 24053)]
> > > > > [New Thread 1090525520 (LWP 24054)]
> > > > > [New Thread 1098918224 (LWP 24055)]
> > > > > [Thread 1090525520 (LWP 24054) exited]
> > > > > [Thread 1098918224 (LWP 24055) exited]
> > > > > [Thread 1082132816 (LWP 24053) exited]
> > > > > [New Thread 1082132816 (LWP 24056)]
> > > > > [New Thread 1098918224 (LWP 24057)]
> > > > > [Thread 1098918224 (LWP 24057) exited]
> > > > > [Thread 1082132816 (LWP 24056) exited]
> > > > > 
> > > > > Program received signal SIGSEGV, Segmentation
> > > > fault.
> > > > > [Switching to Thread 47990231476192 (LWP
> > 8626)]
> > > > > 0x0000000000427480 in max_density
> > (z=0x378a8440,
> > > > > rows=0, cols=<value optimized out>, column=0)
> > at
> > > > > Preprocess/rma_background3.c:301
> > > > > 301         if (dens_y[i] == max_y)
> > > > > 
> > > > > --- Ben Bolstad <bmb@xxxxxxxxxxxxx> wrote:
> > > > > 
> > > > > > I have investigated this issue further by
> > first
> > > > > > downloading all
> > > > > > available ATH1 arrays (GPL198) from GEO. I
> > can
> > > > > > duplicate this error, but
> > > > > > it is not due to a dataset size problem.
> > Instead
> > > > it
> > > > > > is because one of
> > > > > > CEL files is corrupted (GSM260954.cel for
> > the
> > > > > > record). There is actually
> > > > > > also another file in that set that is
> > corrupted,
> > > > at
> > > > > > least at my end
> > > > > > (GSM226522_S17_3.CEL) though the corruption
> > mode
> > > > is
> > > > > > different. Fixes for
> > > > > > detecting these corruption types during the
> > > > parsing
> > > > > > face are imminent
> > > > > > for the next beta release. I'm not sure the
> > > > timing
> > > > > > on this next beta,
> > > > > > though it will probably be within the next
> > > > couple of
> > > > > > weeks.
> > > > > > 
> > > > > > I don't expect these situations to be
> > > > particularly
> > > > > > common so I don't
> > > > > > believe that 1.0 beta 8 is critically
> > damaged,
> > > > and
> > > > > > it is still a
> > > > > > definite improvement over 1.0 beta 7 and
> > earlier
> > > > > > versions for (super)
> > > > > > large datasets.
> > > > > > 
> > > > > > Best,
> > > > > > 
> > > > > > Ben 
> > > > > > 
> > > > > > 
> > > > > > On Mon, 2008-03-03 at 10:04 -0800, Alex
> > Feltus
> > > > > > wrote:
> > > > > > > Ben:
> > > > > > > 
> > > > > > > I can RMA 31 arrays of this type no
> > problem,
> > > > so
> > > > > > the
> > > > > > > data is good.  I have ~300GB of disk space
> > > > that
> > > > > > can be
> > > > > > > used for temp files.  I also never even
> > hit
> > > > swap
> > > > > > since
> > > > > > > I have 16GB RAM.  I am using the default
> > > > buffer
> > > > > > > settings: 150 arrays/50,000 probe sets.
> > > > > > > 
> > > > > > > Here is GDB output for 460 GPL198 arrays:
> > > > > > > 
> > > > > > > GNU gdb 6.6-debian
> > 
> === message truncated ===
> 
> 
> 
>       
> ____________________________________________________________________________________
> Never miss a thing.  Make Yahoo your home page. 
> http://www.yahoo.com/r/hs
> 


Other related posts: