On Mittwoch 02 Dezember 2009 Andrzej N wrote: > I need some help... I did *de novo* assembly of several plant mitochondrial > genome sequences (454, Titanium, one end reads), about 200000 reads used > for assembly, this should give me about... 100x coverage). Yes, I know > overkill, but... MIRA created about 160 contings around 78 quality score > (what is it exactly?) (total number of contigs like 5,000 but including > smaller ones that don’t help much i.e., "junk"). These contigs don't go > together to create one big consensus contig. Hello Adrzej, 100x is not only overkill, it also is a bit dangerous for many assemblers (including MIRA), as there are some unwanted side-effects of ultra-high coverage. One of them: as sequencing errors are not totally random, they tend to accumulate at certain points. If you now have very high coverage, these sequencing errors will be recognised as valid variants and hence split off into other contigs. Plus you've got plant mitochondrial genomes, and these I've come to fear a bit. 454 data from those I've seen so far suggest pretty uneven coverage, which might lead MIRA to have problems if the uniform rad distribution is used, mistakenly recognising some parts as repeats when they're not. > I also did reference assembly, to an already finished and assembled > sequence. MIRA is covering all of this reference sequence with just only > one small break (so I get two huge contings about 200000bp each). > > Now is the interesting part. When I take these contings from *de novo * > assembly* *and blast them against the ones generated based on reference > assembly, they cover the entire sequence very nicely... So, my question is > why MIRA is not creating larger contings during *de novo* assembly. These > contigs are next to each other and show a certain amount of sequence > overlap (I setup BLAST on my computer to blast the against each other) but > MIRA is not seeing this and combining them. Oh, MIRA is probably seeing them, but refuses to join because the ends contain to many sequencing errors (mistakenly recognised as valid variants) or because the ends lay in regions with exceptionally high coverage (mistakenly recognised as repeat). > What parameters in MIRA need to be changed to help build larger contings? > My adjustment to date have not helped do much more than your default > settings for "fast assembly". Umm ... the 'draft' options are really just that: for drafts. And if you've got 60kb chunks it's not too bad already. But use at least 'normal' or 'accurate' mode. Now, other things you probably want to do: 1) decrease sensitivity of repeat marker base recognition. I'd suggest to add 454_SETTINGS -CO:mrpg=12 and see what happens then 2) eventually assemble without uniform read distribution -AS:urd=no and loosen the repeat detection thresholds 454_SETTINGS -AS:ardct=3:mrl=800 or switch off repeat detection altogether -AS:ard=no If everything else fails: join the large contigs by hand in 'gap4', just takes a couple of minutes for a plant mitochondrion :-) Hope that helps, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html