On May 11, 2013, at 23:02 , David Coil <coil.david@xxxxxxxxx> wrote: > But I'm wondering if the results of a mapping assembly (using the > -SB:abnc-yes flag) count as a "genome sequence". *sigh* The -SB:abnc flag. Combines the worst and the best of the worlds of de-novo and mapping assemblies. Looks like way more people are using it than I'd initially thought, maybe I should reconsider its removal in the 3.9 line. > Basically you get one big contig (the mapping) and then a number of small > ones assembled from the leftovers. But the big mapped contig has gaps even > though it's called a "contig". You'd have to break the contig on those gaps > to say, submit the genome to NCBI even though many of them are very small. > One could of course use the unpadded result, but then you may be joining > things together over large gaps that really should be considered separate > contigs. Careful there. Uncovered areas of a backbone (reference sequence) are NOT deleted from the "unpadded" results, only "gap" columns are (i.e., columns created by a spurious insertion base in one (or very few) reads at a given place. If you take the current default FASTA output from a mapping assembly in MIRA you get something many people do not expect: an amalgam of the data from your mapped strain and, in coverage holes, the data from the reference. I thought this to be a good idea, but I'm not so sure anymore. What one should do with the results from mapping: use convert_project to extract the clean "by strain" data. Like this: convert_project -f maf -t fasta mira_out.maf mynewresults Uncovered areas of the backbone are then represented by a string of "N" characters in these new results. > So it seems to me that mapping assemblies are great for answering biological > questions... but insufficient to say publish the genome sequence of an > isolate. Would people agree or disagree with that idea? Am I thinking > about the mapping assemblies incorrectly? Agree, you are thinking about these assemblies totally correctly. I wrote the mapping modes of MIRA to answer biological questions and that's what it does :-) The following is just my 2 cents, other feedback welcome. For *very* related strains however one can think reworking the mapping output slightly in an assembly editor (gap4/gap5) and then use this for publishing. Very related means in this case: whatever you feel comfortable with to finish by hand … and on the importance you attach to having the genome "completed." For strains having just a couple of SNPs, short indels and maybe two or three larger indels or genome reorganisation breakpoints its a no-brainer, for more you have to decide. Hope that helps, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html