[mira_talk] Re: MIRA: I am doing it wrong.

  • From: Alessandro Riccombeni <rikkomba@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 4 Dec 2009 11:50:38 +0000

HI,

thanks for the help.
Yes, actually I think it was mentioned that we paid half as usual for this
sequencing, so probably it was at a lower coverage at the origin.
The sequencing service provided a preassembled dataset as well: 39 scaffolds
and 933 contigs.
Some info: 13 Mbs, 2.38% are Ns, GC in the sequence is 36%, largest scaffold
is 1.9 Mb and the smallest is 3Kb.
After my first MIRA run with the 454 only (as they shouldn't have used any
non-454 reads) I was quite clueless about which strategy did they use to get
39 scaffolds where I got 11000 contigs. As I wrote, they didn't use any
custom adaptor, so I don't know what I should do as preprocessing goes...

On Thu, Dec 3, 2009 at 8:18 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Donnerstag 03 Dezember 2009 Alessandro Riccombeni wrote:
> > I guess I am using on MIRA in the wrong way.
> > My dataset is composed of 512,000 454 Titanium reads, without any custom
> > vector from the sequencing service. I have to assemble a fungal genome of
> > around 13 MBs.
> > [...]
>
> Hi Alessandro,
>
> regardless of what sequencing providers or Roche may tell you ... I don't
> recommend coverages below 20x in 454 sequencing. 30x is a reasonably good
> coverage to get in balance between costs and number of contigs.
>
> Your numbers tell me you have *at most* a theoretical coverage of 16,
> probably
> more in the region of 14 even as 'miramem' still uses the numbers Roche
> gave
> in the early days (475) though, having seen a few Titanium sets now, I
> suspect
> having 400 bases as mean length is more accurate. [Notre to self: change
> that
> ASAP in the miramem estimator]
>
> Even worse, the numbers appended show that MIRA estimates the average
> coverage
> to be more like 8x or 9x at most. Which is somehwat disastrous.
>
> > [...]
> > I got around 10500-11000 large contigs, trying normal, draft and
> accurate.
> > I am adding info from the output at the end of this message.
>
> Something's wrong with your data, at least that's my impression. In the
> most
> simplest case you underestimated the size of the genome and it's not 13MB,
> but
> more like 26 MB.
>
> Then there may be the possibility of contamination: was it really just one
> organism which was sequenced?
>
> Also, your bug might be highly repetitive, which adds another possibility
> for
> a large number of contigs.
>
> And last but not least ... it may be a problem of the sequencing kit used.
> If
> your organism has high GC (>=60%) and the Titanium data was generated with
> a
> sequencing kit delivered in the first 7 to 8 month of this year, then you
> need
> to talk to your provider as they need to talk to Roche.
>
> > I also tried doing a hybrid assembly with 4800 paired Sanger reads,
> getting
> > around 9700 large contigs. I found out that there are vectors for the
> > Forward and Reverse Sanger reads, so this is surely creating problems.
> > Nonetheless, I expected getting a much better result from my 454 reads.
>
> The 5k paired Sanger won't help for such a fragemented assembly as the 454
> data suggests. Even if you'd trim away the sequencing vectors a bit better:
> with 5k paired end you cant't scaffold 10k contigs.
>
> > My question is: is there something I am (blatantly) doing wrong? What am
> I
> > overlooking?
> > It's my first approach to assembling, so please excuse me for being
> > annoying.
>
> Did your sequencing provider give you the result of a Newbler assembly of
> the
> Titanium data and is it equally disastrous? Then there's nothing you can do
> apart getting more sequences or getting the project resequenced if it turns
> out to be a bad sequencing kit.
>
> Regards,
>  Bastien
>
> PS: oh, and don't be lured into the "we could try to close with Solexa"
> trap
>    should the provider propose it. Accept only if *they* do the assembly
> and
>    deliver you the result free of charge :-)
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: