*Hi all... I made a reference assembly of my 454-bacterial data with a closely related strain as the backbone. I got a single contig roughly the size of the reference genome. I also made a de-novo assembly of the same data to get around 60 good quality contigs with sufficient length and coverage. There are a few questions I have about these two...Any help would be greatly appreciated :) De-novo assembly All contigs: ============ Length assessment: ------------------ Number of contigs: 72 Total consensus: 4510449 Largest contig: 691407 N50 contig size: 150832 N90 contig size: 50418 N95 contig size: 27370 Coverage assessment: -------------------- Max coverage (total): 215 Max coverage per sequencing technology Sanger: 0 454: 262 IonTor: 0 PacBio: 0 Solexa: 0 Solid: 0 Quality assessment: ------------------- Average consensus quality: 85 Consensus bases with IUPAC: 4 (you might want to check these) Strong unresolved repeat positions (SRMc): 0 (excellent) Weak unresolved repeat positions (WRMc): 0 (excellent) Sequencing Type Mismatch Unsolved (STMU): 0 (excellent) Contigs having only reads wo qual: 0 (excellent) Contigs with reads wo qual values: 0 (excellent) Reference assembly All contigs: ============ Length assessment: ------------------ Number of contigs: 1 Total consensus: 4909964 Largest contig: 4909964 N50 contig size: 4909964 N90 contig size: 4909964 N95 contig size: 4909964 Coverage assessment: -------------------- Max coverage (total): 272 Max coverage per sequencing technology Sanger: 3 454: 269 IonTor: 0 PacBio: 0 Solexa: 0 Solid: 0 Quality assessment: ------------------- Average consensus quality: 79 Consensus bases with IUPAC: 10440 (you might want to check these) Strong unresolved repeat positions (SRMc): 1592 (you might want to check these) Weak unresolved repeat positions (WRMc): 0 (excellent) Sequencing Type Mismatch Unsolved (STMU): 0 (excellent) Contigs having only reads wo qual: 0 (excellent) Contigs with reads wo qual values: 1 (you might want to check these) * 1. *When visualizing the reference assembly with Tablet, I see that there are regions where there aren't really any reads spanning the region except the template. How is this acceptable ? It appears as though MIRA replaces the assembly with the template sequence which may or may not be present in the sequenced genome. So how far can this assembly be trusted ? * 2. *Secondly, wasn't the reference assembly feature of MIRA developed to identify SNPs and other genomic changes in pre-sequenced genomes ? So, is it technically right to assemble based on closely related organisms ?* 3. *Third, If I were to accept the reference assembly that MIRA has putput, what kind of validation tests are essential before annotation ?* Many thanks in advance. Cheers :) * * *Shankar Manoharan Graduate Student Department of Genetics Madurai Kamaraj University* *Ph. +919790167534* * * *I strongly believe in doing my best and leaving the rest to God* * *