On Apr 18, 2011, at 23:07 , Egon Ozer wrote: > I must be doing something wrong... Or not ... > I'm trying to do a hybrid assembly project using 2,008,245 paired 454 reads > and 5,722,894 PE Illumina reads. The Illumina reads are 53 bp (short, I > know, but that's what I have to work with) and corresponds to about 45x > coverage of my organism. Together with longer reads like from 454 or Sanger, the length is less important than in de-novo assemblies of Solexa only. I have a testbed at home with 800k 454 FLX and 4m Solexa 36bp, and adding those Solexas performs true miracles to the assembly. > The program runs 6 and a half hours, then seg faults. When I check the log > output, I see that it was working on contig # 24994 (!) when it died. OK, so two problems. First one: MIRA should not segfault. Never. Ever. At. All. Period. > The seg fault is one problem which comes and goes. I have been able to run > Mira to completion before on version 3.2.1.11 (after just restarting the run > after the segmentation fault), but the bigger problem is that with that run I > produced 80126 contigs, the largest of which was 1526 bases with an N50 of > 97. That is actually problem number two: something is not right with the data I suppose. If I had to guess: the preprocessing of the SFF files went awfully wrong. Did you use the right linker sequences? Or some non-standard adaptor remained unclipped. Could you send me the, say, first 2000 lines of the "log_assembly.txt" file? I'd like to check a couple of things. > Please give me a hand. I have been really counting on hybrid assembly to > help me get as complete and accurate a sequence for my bacterium as possible, > but I seem to be missing the mark on something. If everything else fails ... do you think you could make the data available to me so that I can have a more detailed look? B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html