Hi Bastien, many thanks. You guys are great. Everything worked perfectly fine. I would like to ask you how you manage to assess quality to your assembly. I was mapping against a reference, which of course helped, but then I found out that my reference's quality, a closed genome and all that, is not as good as one may have thought. Should I assemble de novo and then compare? How do you differentiate among artifacts, missing data, etc. I attach a couple of info .txt I got using from sub-sampled data. Thanks.. Celia ________________________________________ From: mira_talk-bounce@xxxxxxxxxxxxx [mira_talk-bounce@xxxxxxxxxxxxx] on behalf of Bastien Chevreux [bach@xxxxxxxxxxxx] Sent: Wednesday, June 04, 2014 1:34 PM To: mira_talk@xxxxxxxxxxxxx Subject: [mira_talk] Re: Failure, wrapped MIRA process aborted On 04 Jun 2014, at 18:37 , Peter Cock <p.j.a.cock@xxxxxxxxxxxxxx> wrote: > […] > Personally I would tell MIRA to ignore the long read names. Or use “rename_prefix” in the manifest file to have on-the-fly renaming of reads. In your case rename_prefix=HWI-ST330:422:C4AVHACXX clostraur should do the trick. In other news: - mapping in draft mode is not that much faster than in accurate mode, I recommend accurate. - you are mapping almost 20m reads. If the reference is a bacterium, it’s almost sure MIRA will tell you that this is not a good idea … and tell you what to do :-) B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
Localtime: Tue Jun 10 16:32:10 2014 Assembly information: ===================== Localtime: Tue Jun 10 16:32:10 2014 MIRA version: 4.0.2 Num. reads assembled: 325198 Num. singlets: 0 Coverage assessment (calculated from contigs >= 5000 with coverage >= 0): ========================================================= Avg. total coverage: 8.33 Avg. coverage per sequencing technology Sanger: 0.00 454: 0.00 IonTor: 0.00 PcBioHQ: 0.00 PcBioLQ: 0.00 Text: 3.00 Solexa: 5.33 Solid: 0.00 Large contigs (makes less sense for EST assemblies): ==================================================== With Contig size >= 500 AND (Total avg. Cov >= 3 OR Cov(san) >= 0 OR Cov(454) >= 0 OR Cov(ion) >= 0 OR Cov(pbh) >= 0 OR Cov(pbl) >= 0 OR Cov(txt) >= 1 OR Cov(sxa) >= 2 OR Cov(sid) >= 0 ) Length assessment: ------------------ Number of contigs: 1 Total consensus: 6001982 Largest contig: 6001982 N50 contig size: 6001982 N90 contig size: 6001982 N95 contig size: 6001982 Coverage assessment: -------------------- Max coverage (total): 1656 Max coverage per sequencing technology Sanger: 0 454: 0 IonTor: 0 PcBioHQ: 0 PcBioLQ: 0 Text: 3 Solexa: 1653 Solid: 0 Quality assessment: ------------------- Average consensus quality: 38 Consensus bases with IUPAC: 17052 (you might want to check these) Strong unresolved repeat positions (SRMc): 1407 (you might want to check these) Weak unresolved repeat positions (WRMc): 109 (you might want to check these) Sequencing Type Mismatch Unsolved (STMU): 0 (excellent) Contigs having only reads wo qual: 0 (excellent) Contigs with reads wo qual values: 1 (you might want to check these) All contigs: ============ Length assessment: ------------------ Number of contigs: 1 Total consensus: 6001982 Largest contig: 6001982 N50 contig size: 6001982 N90 contig size: 6001982 N95 contig size: 6001982 Coverage assessment: -------------------- Max coverage (total): 1656 Max coverage per sequencing technology Sanger: 0 454: 0 IonTor: 0 PcBioHQ: 0 PcBioLQ: 0 Text: 3 Solexa: 1653 Solid: 0 Quality assessment: ------------------- Average consensus quality: 38 Consensus bases with IUPAC: 17052 (you might want to check these) Strong unresolved repeat positions (SRMc): 1407 (you might want to check these) Weak unresolved repeat positions (WRMc): 109 (you might want to check these) Sequencing Type Mismatch Unsolved (STMU): 0 (excellent) Contigs having only reads wo qual: 0 (excellent) Contigs with reads wo qual values: 1 (you might want to check these)
Localtime: Tue Jul 1 04:20:09 2014 Assembly information: ===================== Localtime: Tue Jul 1 04:20:09 2014 MIRA version: 4.0.2 Num. reads assembled: 865156 Num. singlets: 0 Coverage assessment (calculated from contigs >= 5000 with coverage >= 0): ========================================================= Avg. total coverage: 17.22 Avg. coverage per sequencing technology Sanger: 0.00 454: 0.00 IonTor: 0.00 PcBioHQ: 0.00 PcBioLQ: 0.00 Text: 3.00 Solexa: 14.22 Solid: 0.00 Large contigs (makes less sense for EST assemblies): ==================================================== With Contig size >= 500 AND (Total avg. Cov >= 6 OR Cov(san) >= 0 OR Cov(454) >= 0 OR Cov(ion) >= 0 OR Cov(pbh) >= 0 OR Cov(pbl) >= 0 OR Cov(txt) >= 1 OR Cov(sxa) >= 5 OR Cov(sid) >= 0 ) Length assessment: ------------------ Number of contigs: 1 Total consensus: 6002288 Largest contig: 6002288 N50 contig size: 6002288 N90 contig size: 6002288 N95 contig size: 6002288 Coverage assessment: -------------------- Max coverage (total): 4207 Max coverage per sequencing technology Sanger: 0 454: 0 IonTor: 0 PcBioHQ: 0 PcBioLQ: 0 Text: 3 Solexa: 4204 Solid: 0 Quality assessment: ------------------- Average consensus quality: 42 Consensus bases with IUPAC: 14309 (you might want to check these) Strong unresolved repeat positions (SRMc): 2374 (you might want to check these) Weak unresolved repeat positions (WRMc): 167 (you might want to check these) Sequencing Type Mismatch Unsolved (STMU): 0 (excellent) Contigs having only reads wo qual: 0 (excellent) Contigs with reads wo qual values: 1 (you might want to check these) All contigs: ============ Length assessment: ------------------ Number of contigs: 1 Total consensus: 6002288 Largest contig: 6002288 N50 contig size: 6002288 N90 contig size: 6002288 N95 contig size: 6002288 Coverage assessment: -------------------- Max coverage (total): 4207 Max coverage per sequencing technology Sanger: 0 454: 0 IonTor: 0 PcBioHQ: 0 PcBioLQ: 0 Text: 3 Solexa: 4204 Solid: 0 Quality assessment: ------------------- Average consensus quality: 42 Consensus bases with IUPAC: 14309 (you might want to check these) Strong unresolved repeat positions (SRMc): 2374 (you might want to check these) Weak unresolved repeat positions (WRMc): 167 (you might want to check these) Sequencing Type Mismatch Unsolved (STMU): 0 (excellent) Contigs having only reads wo qual: 0 (excellent) Contigs with reads wo qual values: 1 (you might want to check these)