[mira_talk] Re: Reference vs. De novo assembly.

  • From: Andrzej N <andrzej.k.n@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 3 Dec 2009 14:01:32 -0600

THANK YOU VERY MUCH FOR ANSWER :).

It's very basic question! Can ANYBODY tell me how to join contigs in GAP4?
Yes, I did set up EVERYTHING, I see them etc. (contigs look really accurate,
not many errors etc) but I can make this stuff work for me...in my place
nobody have any idea how to do this stuff...

How can I join contigs if they don't overlap?

Do you know any type of manual...

I always hear "by hand"... HOW?

THANK YOU!

Andrzej

Ps. I will try setting you provided.

On Thu, Dec 3, 2009 at 1:48 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Mittwoch 02 Dezember 2009 Andrzej N wrote:
> > I need some help... I did *de novo* assembly of several plant
> mitochondrial
> > genome sequences (454, Titanium, one end reads), about 200000 reads used
> >  for assembly, this should give me about... 100x coverage). Yes, I know
> >  overkill, but... MIRA created about 160  contings around 78 quality
> score
> >  (what is it exactly?) (total number of contigs like 5,000 but including
> >  smaller ones that don’t help much i.e., "junk"). These contigs don't go
> >  together to create one big consensus contig.
>
> Hello Adrzej,
>
> 100x is not only overkill, it also is a bit dangerous for many assemblers
> (including MIRA), as there are some unwanted side-effects of ultra-high
> coverage. One of them: as sequencing errors are not totally random, they
> tend
> to accumulate at certain points. If you now have very high coverage, these
> sequencing errors will be recognised as valid variants and hence split off
> into other contigs.
>
> Plus you've got plant mitochondrial genomes, and these I've come to fear a
> bit. 454 data from those I've seen so far suggest pretty uneven coverage,
> which might lead MIRA to have problems if the uniform rad distribution is
> used, mistakenly recognising some parts as repeats when they're not.
>
> > I also did reference assembly, to an already finished and assembled
> > sequence. MIRA is covering all of this reference sequence with just only
> >  one small break (so I get two huge contings about 200000bp each).
> >
> > Now is the interesting part. When I take these contings from *de novo *
> > assembly* *and blast them against the ones generated based on reference
> > assembly, they cover the entire sequence very nicely... So, my question
> is
> > why MIRA is not creating larger contings during *de novo* assembly. These
> > contigs are next to each other and show a certain amount of sequence
> >  overlap (I setup BLAST on my computer to blast the against each other)
> but
> >  MIRA is not seeing this and combining them.
>
> Oh, MIRA is probably seeing them, but refuses to join because the ends
> contain
> to many sequencing errors (mistakenly recognised as valid variants) or
> because
> the ends lay in regions with exceptionally high coverage (mistakenly
> recognised as repeat).
>
> > What parameters in MIRA need to be changed to help build larger contings?
> >  My adjustment to date have not helped do much more than your default
> >  settings for "fast assembly".
>
> Umm ... the 'draft' options are really just that: for drafts. And if you've
> got 60kb chunks it's not too bad already. But use at least 'normal' or
> 'accurate' mode.
>
> Now, other things you probably want to do:
> 1) decrease sensitivity of repeat marker base recognition. I'd suggest to
> add
>     454_SETTINGS -CO:mrpg=12
>   and see what happens then
> 2) eventually assemble without uniform read distribution
>     -AS:urd=no
>   and loosen the repeat detection thresholds
>     454_SETTINGS -AS:ardct=3:mrl=800
>   or switch off repeat detection altogether
>     -AS:ard=no
>
> If everything else fails: join the large contigs by hand in 'gap4', just
> takes
> a couple of minutes for a plant mitochondrion :-)
>




>
> Hope that helps,
>  Bastien
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: