What is the purpose of your endeavor? Sincerely, Adrian > On Sep 25, 2014, at 8:39 PM, Said Muñoz Montero <said3427@xxxxxxxxx> wrote: > > Hello Mira experts, > > I am doing a mapping assembly with 9 different parasite isolates > simultaneously using a reference genome from the same specie. The genome > variability between samples is low, except for copy number variation. The > total coverage after combining all samples is 360X so I changed the > -NW:cac=stop parameter. > > I have read the warnings about similar tasks in MIRA mailing list but these > are referred to a denovo assembly. Despite the computational resources > needed. What do you think about these strategy? > > I would really appreciate any advice! > > Here are the warnings given by Bastien in the Mira Guide: > > "With todays' sequencing technologies (especially Illumina, but also Ion > Torrent and 454), many people simply take everything they get and throw it > into an assembly. Which, in the case of Illumina and Ion, can mean they try > to assemble their organism with a coverage of 100x, 200x and more (I've > seen trials with more than 1000x). > > This is not good. Not. At. All! For two reasons (well, three to be precise). > The first reason is that, usually, one does not sequence a single cell but > a population of cells. If this population is not clonal (i.e., it contains > subpopulations with genomic differences with each other), assemblers will > be able to pick up these differences in the DNA once a certain sequence > count is reached and they will try reconstruct a genome containing all > clonal variations, treating these variations as potential repeats with > slightly different sequences. Which, of course, will be wrong and I am > pretty sure you do not want that. > > The second and way more important reason is that none of the current > sequencing technologies is completely error free. Even more problematic, > they contain both random and non-random sequencing errors. Especially the > latter can become a big hurdle if these non-random errors are so prevalent > that they suddenly appear to be valid sequence to an assembler. This in > turn leads to false repeat detection, hence possibly contig breaks or even > wrong consensus sequence. You don't want that, do you? > > The last reason is that overlap based assemblers (like MIRA is) need > *exponentially* more time and memory when the coverage increases. So > keeping the coverage comparatively low helps you there." > > THANKS!!!