[mira_talk] Re: Reference vs. De novo assembly.

  • From: Sven Klages <sir.svencelot@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 3 Dec 2009 21:19:55 +0100

You might want to join contigs via "Find internal joins" (dot plot) or
directly in the "Join Editor".

But keep in mind, you cannot join contigs if they don't overlap. You can
just change contig layout
accordingly (kind of manual scaffolding).

cheers,
Sven

2009/12/3 Andrzej N <andrzej.k.n@xxxxxxxxx>

> THANK YOU VERY MUCH FOR ANSWER :).
>
> It's very basic question! Can ANYBODY tell me how to join contigs in GAP4?
> Yes, I did set up EVERYTHING, I see them etc. (contigs look really accurate,
> not many errors etc) but I can make this stuff work for me...in my place
> nobody have any idea how to do this stuff...
>
> How can I join contigs if they don't overlap?
>
> Do you know any type of manual...
>
> I always hear "by hand"... HOW?
>
> THANK YOU!
>
> Andrzej
>
> Ps. I will try setting you provided.
>
> On Thu, Dec 3, 2009 at 1:48 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:
>
>> On Mittwoch 02 Dezember 2009 Andrzej N wrote:
>> > I need some help... I did *de novo* assembly of several plant
>> mitochondrial
>> > genome sequences (454, Titanium, one end reads), about 200000 reads used
>> >  for assembly, this should give me about... 100x coverage). Yes, I know
>> >  overkill, but... MIRA created about 160  contings around 78 quality
>> score
>> >  (what is it exactly?) (total number of contigs like 5,000 but including
>> >  smaller ones that don’t help much i.e., "junk"). These contigs don't go
>> >  together to create one big consensus contig.
>>
>> Hello Adrzej,
>>
>> 100x is not only overkill, it also is a bit dangerous for many assemblers
>> (including MIRA), as there are some unwanted side-effects of ultra-high
>> coverage. One of them: as sequencing errors are not totally random, they
>> tend
>> to accumulate at certain points. If you now have very high coverage, these
>> sequencing errors will be recognised as valid variants and hence split off
>> into other contigs.
>>
>> Plus you've got plant mitochondrial genomes, and these I've come to fear a
>> bit. 454 data from those I've seen so far suggest pretty uneven coverage,
>> which might lead MIRA to have problems if the uniform rad distribution is
>> used, mistakenly recognising some parts as repeats when they're not.
>>
>> > I also did reference assembly, to an already finished and assembled
>> > sequence. MIRA is covering all of this reference sequence with just only
>> >  one small break (so I get two huge contings about 200000bp each).
>> >
>> > Now is the interesting part. When I take these contings from *de novo *
>> > assembly* *and blast them against the ones generated based on reference
>> > assembly, they cover the entire sequence very nicely... So, my question
>> is
>> > why MIRA is not creating larger contings during *de novo* assembly.
>> These
>> > contigs are next to each other and show a certain amount of sequence
>> >  overlap (I setup BLAST on my computer to blast the against each other)
>> but
>> >  MIRA is not seeing this and combining them.
>>
>> Oh, MIRA is probably seeing them, but refuses to join because the ends
>> contain
>> to many sequencing errors (mistakenly recognised as valid variants) or
>> because
>> the ends lay in regions with exceptionally high coverage (mistakenly
>> recognised as repeat).
>>
>> > What parameters in MIRA need to be changed to help build larger
>> contings?
>> >  My adjustment to date have not helped do much more than your default
>> >  settings for "fast assembly".
>>
>> Umm ... the 'draft' options are really just that: for drafts. And if
>> you've
>> got 60kb chunks it's not too bad already. But use at least 'normal' or
>> 'accurate' mode.
>>
>> Now, other things you probably want to do:
>> 1) decrease sensitivity of repeat marker base recognition. I'd suggest to
>> add
>>     454_SETTINGS -CO:mrpg=12
>>   and see what happens then
>> 2) eventually assemble without uniform read distribution
>>     -AS:urd=no
>>   and loosen the repeat detection thresholds
>>     454_SETTINGS -AS:ardct=3:mrl=800
>>   or switch off repeat detection altogether
>>     -AS:ard=no
>>
>> If everything else fails: join the large contigs by hand in 'gap4', just
>> takes
>> a couple of minutes for a plant mitochondrion :-)
>>
>
>
>
>
>>
>> Hope that helps,
>>  Bastien
>>
>> --
>> You have received this mail because you are subscribed to the mira_talk
>> mailing list. For information on how to subscribe or unsubscribe, please
>> visit http://www.chevreux.org/mira_mailinglists.html
>>
>
>

Other related posts: