[mira_talk] Re: Reference vs. De novo assembly.

  • From: Andrzej N <andrzej.k.n@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 3 Dec 2009 14:24:32 -0600

Yes, I tried that, I do have some VERY small regions of similarities.

But now is question...how I can physically use this information from dot
plot...to build consensus from that two contigs..." slide them" against each
other. I have like 160 contigs (of "better quality") and total is like 5000
contigs (insane number do start doing this "by hand").

Thank you,


On Thu, Dec 3, 2009 at 2:19 PM, Sven Klages <sir.svencelot@xxxxxxxxxxxxxx>wrote:

> You might want to join contigs via "Find internal joins" (dot plot) or
> directly in the "Join Editor".
> But keep in mind, you cannot join contigs if they don't overlap. You can
> just change contig layout
> accordingly (kind of manual scaffolding).
> cheers,
> Sven
> 2009/12/3 Andrzej N <andrzej.k.n@xxxxxxxxx>
>> It's very basic question! Can ANYBODY tell me how to join contigs in GAP4?
>> Yes, I did set up EVERYTHING, I see them etc. (contigs look really accurate,
>> not many errors etc) but I can make this stuff work for me...in my place
>> nobody have any idea how to do this stuff...
>> How can I join contigs if they don't overlap?
>> Do you know any type of manual...
>> I always hear "by hand"... HOW?
>> Andrzej
>> Ps. I will try setting you provided.
>> On Thu, Dec 3, 2009 at 1:48 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:
>>> On Mittwoch 02 Dezember 2009 Andrzej N wrote:
>>> > I need some help... I did *de novo* assembly of several plant
>>> mitochondrial
>>> > genome sequences (454, Titanium, one end reads), about 200000 reads
>>> used
>>> >  for assembly, this should give me about... 100x coverage). Yes, I know
>>> >  overkill, but... MIRA created about 160  contings around 78 quality
>>> score
>>> >  (what is it exactly?) (total number of contigs like 5,000 but
>>> including
>>> >  smaller ones that don’t help much i.e., "junk"). These contigs don't
>>> go
>>> >  together to create one big consensus contig.
>>> Hello Adrzej,
>>> 100x is not only overkill, it also is a bit dangerous for many assemblers
>>> (including MIRA), as there are some unwanted side-effects of ultra-high
>>> coverage. One of them: as sequencing errors are not totally random, they
>>> tend
>>> to accumulate at certain points. If you now have very high coverage,
>>> these
>>> sequencing errors will be recognised as valid variants and hence split
>>> off
>>> into other contigs.
>>> Plus you've got plant mitochondrial genomes, and these I've come to fear
>>> a
>>> bit. 454 data from those I've seen so far suggest pretty uneven coverage,
>>> which might lead MIRA to have problems if the uniform rad distribution is
>>> used, mistakenly recognising some parts as repeats when they're not.
>>> > I also did reference assembly, to an already finished and assembled
>>> > sequence. MIRA is covering all of this reference sequence with just
>>> only
>>> >  one small break (so I get two huge contings about 200000bp each).
>>> >
>>> > Now is the interesting part. When I take these contings from *de novo *
>>> > assembly* *and blast them against the ones generated based on reference
>>> > assembly, they cover the entire sequence very nicely... So, my question
>>> is
>>> > why MIRA is not creating larger contings during *de novo* assembly.
>>> These
>>> > contigs are next to each other and show a certain amount of sequence
>>> >  overlap (I setup BLAST on my computer to blast the against each other)
>>> but
>>> >  MIRA is not seeing this and combining them.
>>> Oh, MIRA is probably seeing them, but refuses to join because the ends
>>> contain
>>> to many sequencing errors (mistakenly recognised as valid variants) or
>>> because
>>> the ends lay in regions with exceptionally high coverage (mistakenly
>>> recognised as repeat).
>>> > What parameters in MIRA need to be changed to help build larger
>>> contings?
>>> >  My adjustment to date have not helped do much more than your default
>>> >  settings for "fast assembly".
>>> Umm ... the 'draft' options are really just that: for drafts. And if
>>> you've
>>> got 60kb chunks it's not too bad already. But use at least 'normal' or
>>> 'accurate' mode.
>>> Now, other things you probably want to do:
>>> 1) decrease sensitivity of repeat marker base recognition. I'd suggest to
>>> add
>>>     454_SETTINGS -CO:mrpg=12
>>>   and see what happens then
>>> 2) eventually assemble without uniform read distribution
>>>     -AS:urd=no
>>>   and loosen the repeat detection thresholds
>>>     454_SETTINGS -AS:ardct=3:mrl=800
>>>   or switch off repeat detection altogether
>>>     -AS:ard=no
>>> If everything else fails: join the large contigs by hand in 'gap4', just
>>> takes
>>> a couple of minutes for a plant mitochondrion :-)
>>> Hope that helps,
>>>  Bastien
>>> --
>>> You have received this mail because you are subscribed to the mira_talk
>>> mailing list. For information on how to subscribe or unsubscribe, please
>>> visit http://www.chevreux.org/mira_mailinglists.html

Other related posts: