Ben: I just wanted to do a final follow-up. I found a few more corrupt .cel files and was able to RMA normalize 2852 arrays. Woo-hoo! Of course, a .cel check would be a great addition to your code as it works wonderfully when the data is sound. One other thing I noticed is that your auto file listing in the GUI mode appears to be case sensitive (CAPS), so it misses .cel files. Thanks for the tech support! Alex --- Ben Bolstad <bmb@xxxxxxxxxxxxx> wrote: > Well, really this should be more properly handled by > the parsing code > (ie my responsibility). But it depends on the > failure mode as to whether > you can detect it manually yourself. In the case of > the two files below, > the corruption is with the data stored in the file > itself rather than > something fairly easy to detect by just looking at > file sizes etc. > > That said, one way to do it would be to use the Raw > Data Visualizer > option, and more specifically the "Individual > Density Plots" option. > Scrolling through these should show the corrupt > files (the plot will > most likely be non existent). Doing this at this end > shows these are all > potentially corrupt: > > GSM128656.CEL > GSM260885.CEL > GSM133917.CEL > GSM133941.CEL > GSM133946.CEL > GSM133954.CEL > GSM133956.CEL > GSM133972.CEL > GSM133982.CEL > GSM133990.CEL > GSM134356.CEL > GSM134368.CEL > GSM134372.CEL > GSM134393.CEL > GSM134407.CEL > GSM134420.CEL > GSM134442.CEL > GSM134453.CEL > GSM134460.CEL > GSM134461.CEL > GSM142607.CEL > GSM142791.CEL > GSM157302.CEL > GSM157308.CEL > GSM157313.CEL > GSM157320.CEL > GSM183517.CEL > > Although that seems to be a lot of files out of > approx 2800 it is not > that many (about 1% of files). I can't guarantee > that list is > comprehensive or that I did not accidentally mistype > one of the names > above. Also it is potentially possible there are a > different set of > corruptions in the data I have and you have. > > > Ben > > > On Wed, 2008-03-05 at 06:22 -0800, Alex Feltus > wrote: > > Is there any specificity to the corruption? Can I > > prescreen files in some way? > > > > Alex > > > > --- Ben Bolstad <bmb@xxxxxxxxxxxxx> wrote: > > > > > I am guessing that there are further corrupted > CEL > > > files in that > > > dataset. I did not have the energy to further > > > examine that possibility > > > yesterday. > > > > > > Ben > > > > > > > > > On Wed, 2008-03-05 at 06:03 -0800, Alex Feltus > > > wrote: > > > > Ben: > > > > > > > > Thanks for your help with this. I removed > those > > > two > > > > .CEL files which allowed me to get to the BA > stage > > > > (further than before). However, I still get a > > > > segmentation fault. > > > > > > > > Alex > > > > > > > > Here is the gdb output (appears to break as > > > before): > > > > > > > > Starting program: /usr/local/bin/RMAExpress > > > > [Thread debugging using libthread_db enabled] > > > > [New Thread 47990231476192 (LWP 8626)] > > > > [New Thread 1082132816 (LWP 24051)] > > > > [New Thread 1090525520 (LWP 24052)] > > > > [Thread 1090525520 (LWP 24052) exited] > > > > [Thread 1082132816 (LWP 24051) exited] > > > > [New Thread 1082132816 (LWP 24053)] > > > > [New Thread 1090525520 (LWP 24054)] > > > > [New Thread 1098918224 (LWP 24055)] > > > > [Thread 1090525520 (LWP 24054) exited] > > > > [Thread 1098918224 (LWP 24055) exited] > > > > [Thread 1082132816 (LWP 24053) exited] > > > > [New Thread 1082132816 (LWP 24056)] > > > > [New Thread 1098918224 (LWP 24057)] > > > > [Thread 1098918224 (LWP 24057) exited] > > > > [Thread 1082132816 (LWP 24056) exited] > > > > > > > > Program received signal SIGSEGV, Segmentation > > > fault. > > > > [Switching to Thread 47990231476192 (LWP > 8626)] > > > > 0x0000000000427480 in max_density > (z=0x378a8440, > > > > rows=0, cols=<value optimized out>, column=0) > at > > > > Preprocess/rma_background3.c:301 > > > > 301 if (dens_y[i] == max_y) > > > > > > > > --- Ben Bolstad <bmb@xxxxxxxxxxxxx> wrote: > > > > > > > > > I have investigated this issue further by > first > > > > > downloading all > > > > > available ATH1 arrays (GPL198) from GEO. I > can > > > > > duplicate this error, but > > > > > it is not due to a dataset size problem. > Instead > > > it > > > > > is because one of > > > > > CEL files is corrupted (GSM260954.cel for > the > > > > > record). There is actually > > > > > also another file in that set that is > corrupted, > > > at > > > > > least at my end > > > > > (GSM226522_S17_3.CEL) though the corruption > mode > > > is > > > > > different. Fixes for > > > > > detecting these corruption types during the > > > parsing > > > > > face are imminent > > > > > for the next beta release. I'm not sure the > > > timing > > > > > on this next beta, > > > > > though it will probably be within the next > > > couple of > > > > > weeks. > > > > > > > > > > I don't expect these situations to be > > > particularly > > > > > common so I don't > > > > > believe that 1.0 beta 8 is critically > damaged, > > > and > > > > > it is still a > > > > > definite improvement over 1.0 beta 7 and > earlier > > > > > versions for (super) > > > > > large datasets. > > > > > > > > > > Best, > > > > > > > > > > Ben > > > > > > > > > > > > > > > On Mon, 2008-03-03 at 10:04 -0800, Alex > Feltus > > > > > wrote: > > > > > > Ben: > > > > > > > > > > > > I can RMA 31 arrays of this type no > problem, > > > so > > > > > the > > > > > > data is good. I have ~300GB of disk space > > > that > > > > > can be > > > > > > used for temp files. I also never even > hit > > > swap > > > > > since > > > > > > I have 16GB RAM. I am using the default > > > buffer > > > > > > settings: 150 arrays/50,000 probe sets. > > > > > > > > > > > > Here is GDB output for 460 GPL198 arrays: > > > > > > > > > > > > GNU gdb 6.6-debian > === message truncated === ____________________________________________________________________________________ Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs