[mira_talk] Re: Reference vs. De novo assembly.

  • From: Sven Klages <sir.svencelot@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 3 Dec 2009 21:33:12 +0100

2009/12/3 Andrzej N <andrzej.k.n@xxxxxxxxx>

> Yes, I tried that, I do have some VERY small regions of similarities.
>
> But now is question...how I can physically use this information from dot
> plot...to build consensus from that two contigs..." slide them" against each
> other. I have like 160 contigs (of "better quality") and total is like 5000
> contigs (insane number do start doing this "by hand").
>
>
Well I totally agree, but you won't trust an automatism which joins "VERY
small regions of similiarities" for you, wouldn't you?
At least for "better quality" contigs you should do this manually. If I
understood correctly, all the smaller contigs are of less
interest/help.

cheers,
Sven


> Thank you,
>
> Andrzej
>
>
> On Thu, Dec 3, 2009 at 2:19 PM, Sven Klages 
> <sir.svencelot@xxxxxxxxxxxxxx>wrote:
>
>> You might want to join contigs via "Find internal joins" (dot plot) or
>> directly in the "Join Editor".
>>
>> But keep in mind, you cannot join contigs if they don't overlap. You can
>> just change contig layout
>> accordingly (kind of manual scaffolding).
>>
>> cheers,
>> Sven
>>
>> 2009/12/3 Andrzej N <andrzej.k.n@xxxxxxxxx>
>>
>> THANK YOU VERY MUCH FOR ANSWER :).
>>>
>>> It's very basic question! Can ANYBODY tell me how to join contigs in
>>> GAP4? Yes, I did set up EVERYTHING, I see them etc. (contigs look really
>>> accurate, not many errors etc) but I can make this stuff work for me...in my
>>> place nobody have any idea how to do this stuff...
>>>
>>> How can I join contigs if they don't overlap?
>>>
>>> Do you know any type of manual...
>>>
>>> I always hear "by hand"... HOW?
>>>
>>> THANK YOU!
>>>
>>> Andrzej
>>>
>>> Ps. I will try setting you provided.
>>>
>>> On Thu, Dec 3, 2009 at 1:48 PM, Bastien Chevreux <bach@xxxxxxxxxxxx>wrote:
>>>
>>>> On Mittwoch 02 Dezember 2009 Andrzej N wrote:
>>>> > I need some help... I did *de novo* assembly of several plant
>>>> mitochondrial
>>>> > genome sequences (454, Titanium, one end reads), about 200000 reads
>>>> used
>>>> >  for assembly, this should give me about... 100x coverage). Yes, I
>>>> know
>>>> >  overkill, but... MIRA created about 160  contings around 78 quality
>>>> score
>>>> >  (what is it exactly?) (total number of contigs like 5,000 but
>>>> including
>>>> >  smaller ones that don’t help much i.e., "junk"). These contigs don't
>>>> go
>>>> >  together to create one big consensus contig.
>>>>
>>>> Hello Adrzej,
>>>>
>>>> 100x is not only overkill, it also is a bit dangerous for many
>>>> assemblers
>>>> (including MIRA), as there are some unwanted side-effects of ultra-high
>>>> coverage. One of them: as sequencing errors are not totally random, they
>>>> tend
>>>> to accumulate at certain points. If you now have very high coverage,
>>>> these
>>>> sequencing errors will be recognised as valid variants and hence split
>>>> off
>>>> into other contigs.
>>>>
>>>> Plus you've got plant mitochondrial genomes, and these I've come to fear
>>>> a
>>>> bit. 454 data from those I've seen so far suggest pretty uneven
>>>> coverage,
>>>> which might lead MIRA to have problems if the uniform rad distribution
>>>> is
>>>> used, mistakenly recognising some parts as repeats when they're not.
>>>>
>>>> > I also did reference assembly, to an already finished and assembled
>>>> > sequence. MIRA is covering all of this reference sequence with just
>>>> only
>>>> >  one small break (so I get two huge contings about 200000bp each).
>>>> >
>>>> > Now is the interesting part. When I take these contings from *de novo
>>>> *
>>>> > assembly* *and blast them against the ones generated based on
>>>> reference
>>>> > assembly, they cover the entire sequence very nicely... So, my
>>>> question is
>>>> > why MIRA is not creating larger contings during *de novo* assembly.
>>>> These
>>>> > contigs are next to each other and show a certain amount of sequence
>>>> >  overlap (I setup BLAST on my computer to blast the against each
>>>> other) but
>>>> >  MIRA is not seeing this and combining them.
>>>>
>>>> Oh, MIRA is probably seeing them, but refuses to join because the ends
>>>> contain
>>>> to many sequencing errors (mistakenly recognised as valid variants) or
>>>> because
>>>> the ends lay in regions with exceptionally high coverage (mistakenly
>>>> recognised as repeat).
>>>>
>>>> > What parameters in MIRA need to be changed to help build larger
>>>> contings?
>>>> >  My adjustment to date have not helped do much more than your default
>>>> >  settings for "fast assembly".
>>>>
>>>> Umm ... the 'draft' options are really just that: for drafts. And if
>>>> you've
>>>> got 60kb chunks it's not too bad already. But use at least 'normal' or
>>>> 'accurate' mode.
>>>>
>>>> Now, other things you probably want to do:
>>>> 1) decrease sensitivity of repeat marker base recognition. I'd suggest
>>>> to add
>>>>     454_SETTINGS -CO:mrpg=12
>>>>   and see what happens then
>>>> 2) eventually assemble without uniform read distribution
>>>>     -AS:urd=no
>>>>   and loosen the repeat detection thresholds
>>>>     454_SETTINGS -AS:ardct=3:mrl=800
>>>>   or switch off repeat detection altogether
>>>>     -AS:ard=no
>>>>
>>>> If everything else fails: join the large contigs by hand in 'gap4', just
>>>> takes
>>>> a couple of minutes for a plant mitochondrion :-)
>>>>
>>>
>>>
>>>
>>>
>>>>
>>>> Hope that helps,
>>>>  Bastien
>>>>
>>>> --
>>>> You have received this mail because you are subscribed to the mira_talk
>>>> mailing list. For information on how to subscribe or unsubscribe, please
>>>> visit http://www.chevreux.org/mira_mailinglists.html
>>>>
>>>
>>>
>>
>

Other related posts: