I must be doing something wrong... I'm trying to do a hybrid assembly project using 2,008,245 paired 454 reads and 5,722,894 PE Illumina reads. The Illumina reads are 53 bp (short, I know, but that's what I have to work with) and corresponds to about 45x coverage of my organism. The program runs 6 and a half hours, then seg faults. When I check the log output, I see that it was working on contig # 24994 (!) when it died. The seg fault is one problem which comes and goes. I have been able to run Mira to completion before on version 3.2.1.11 (after just restarting the run after the segmentation fault), but the bigger problem is that with that run I produced 80126 contigs, the largest of which was 1526 bases with an N50 of 97. What am I doing wrong? Why does assembly of the 454 reads alone in Celera assembler produce 49 contigs with an average size of 137,888 bases and assembly of the Illumina reads alone in Velvet produces 595 contigs with an N50 of 122385, but my Mira output is so miserable (when it actually finishes). Please give me a hand. I have been really counting on hybrid assembly to help me get as complete and accurate a sequence for my bacterium as possible, but I seem to be missing the mark on something. Thanks. - E Supplemental information: My most recent attempt at this assembly was on Mira version 3.12.1.15_dev_darwin10.6.0_x86_64_static (but I've had this same problem with verions 3.2.1.5, 3.2.1.7, and 3.2.1.11 as well). Here's how I prepare my files for mira: for 454 reads: sff_extract_0_2_8.py -s out.fasta -q out.qual -x out.xml -l linkers.fa -i "insert_size:3000,insert_stdev:900" in1.sff in2.sff ln -s out.fasta proj_in.454.fasta ln -s out.qual proj_in.454.fasta.qual ln -s out.xml proj_traceinfo_in.454.xml for Illumina reads: cat s_8_1_sequence.txt s_8_2_sequence.txt > combined.fastq ln -s proj_in.solexa.fastq my command line: mira --project=proj --job=denovo,genome,accurate,454,solexa -GE:not=16 SOLEXA_SETTINGS -GE:tismin=150:tismax=350 454_SETTINGS -DP:ure=1 -CL:emrc=1 >&log_assembly.txt I'm using a MacPro running Snow Leopard with 64G of RAM and 1.65 TB of free hard drive space. Here's the last little bit of my log file right before the fault: -------------- Contig statistics ---------------- Contig id: 24994 Contig length: 86 Sanger 454 PacBio Solexa Solid Num. reads 0 0 0 27 0 100% merged reads - - - 0 0 Avg. read len 0 0 0 51 0 Max. coverage 0 0 0 27 0 Avg. coverage 0.000 0.000 0.000 16.023 0.000 Max. contig coverage: 27 Avg. contig coverage: 16.023 Consensus contains: A: 14 C: 33 G: 27 T: 12 N: 0 IUPAC: 0 Funny: 0 *: 0 GC content: 69.767% ------------------------------------------------- Timing BFC cout constats: 228 Localtime: Mon Apr 18 15:00:16 2011 bfc 10/0 Timing BFC edit tricky1: 1 Marking possibly misassembled repeats: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] done step 1, starting step 2:done. Found none. Timing BFC mark reps: 515 bfc 11/0 bfc 12/0 Timing BFC delPSHP: 1 bfc 13/0 bfc 14/0 bfc 15/0 bfc 16/0 Transfering reads to readpool. Timing BFC rp transfer: 102 Done. bfc 17/0 bfc 19 Storing contig ... 10Searching for: SROs UNSs IUPACs, preparing needed data: sorting tags ... fetching consensus for strain0 ...done. Starting search: done with search Transfering tags to readpool. Saving temp CAF ... done. done. Timing BFC store con: 1250 Timing BFC loop total: 11861 bfc 1 Localtime: Mon Apr 18 15:00:16 2011 Timing BFC unused: 32509 Unused: 2326118 AS_used_ids.size(): 7731139 bfc 2 Timing BFC prelim1: 7 bfc 3 bfc 4 bfc 5 Timing BFC setup AS_used_ids: 1 bfc 6/0 Timing BFC discard con: 3 bfc 7/0 Building new contig 24995 Localtime: Mon Apr 18 15:00:16 2011 Unused reads: 2326118 bfc 8/0 assemblymode_mapping: 0 use genomic pathfinder: 1 Timing n4_basicCSBSSetup cleararrays: 1522 Timing n4_basicCSBSSetup init pf_banned: 0 Timing n4_basicCSBSSetup total: 1530 -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html