[mira_talk] Re: mira bambus mira

  • From: Davide Sassera <davide.sassera@xxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 22 Mar 2011 13:50:26 +0100

Dear Bastien,

unfortunately the 454 data is only 20x, so I guess I should really cut most of the solexa (70bp by the way).

I have thus another question: how should I choose the solexa reads?
Random?
Choose the longer ones?

We wrote a script that selects the ones with the highest total quality, which should allow to obtain long reads with high quality. I was wondering if this would introduce a bias towards "easier" regions, as difficult regions may not be covered by long high quality reads.

Any thoughts on this?

Many thanks

Davide








On Monday 21 March 2011 16:04:10 Davide Sassera wrote:

> My problem, briefly, is: can I do mira --> bambus --> mira? is it

> useful? how should I do it?

Yes, that is certainly useful.

> More in detail here is the situation I'm facing:

>

> I'm currently assemblying a 3.5 Mb genome from 1 solexa lane (28M reads)

> and 3/4 454 paired gs-flx plate (3kb library).

How long are the Solexas? 36, 75, 100? Although even with 36bp, that would be way overcovered. You might want to think about using only half or a third of the reads.

A good rule of thumb I use for hybrid de-novos: have the coverage of Solexa be approximately the same (or in the range of +/- 50%) as the one from 454. I guess your 454 coverage to be around 30-40x, so having 30x to 60x Solexa would already be good.

The numbers above are not just taken out of thin air: fluctuations in coverage of one sequencing technology are often compensated (or better: attenuated) by the other technology. If however the other technology is dominating in terms of reads, it will also dominate in terms of sequence dependent read coverage and void one of the good things a hynrid approach offers.

> I did a denovo assembly with the 454 and 5 M solexa reads (memory

> constraints).

With 75bp on Solexa that would be a ~90x coverage for Solexa ... probably to much already.

> [...]

> I now feel I could elongate the 84 contigs I have with a novel mapping,

> but I'm not sure on how to proceed:

>

> Should I use the 84 contigs as they are and hope that a novel mapping

> adds the reads that will allow me to join them?

>

> Or should I use the scaffold generated by bambus using print_scaff? This

> second option has the advantage of piloting the joins towards what

> bambus says,

The second option I would take: take the bambus-generated scaffolds and map your 454 and, say, 3 to 5m reads to that.

> but bambus puts a stretch on Ns between two contigs. How

> does mira handle this? can mira use the Ns to join the contigs or not?

MIRA will not really join the contigs, but will happy map reads which overhang into the N-part. For small gaps, one round of mapping may already close most of them. You will however need to manually realign in gap4 if the estimated length of the N-stretch from Bambus was too far off the real number. Then rinse and repeat until you are happy with the result.

E.g.: light manual correction afte a mapping round:

backbone .............NNNNNNNNNN...................

mapr1 ......AAAGGTT

mapr2 AGGTT...................

The above suggests that the N-stretch was a bit too long.

backbone .............NNNNNNNNNN...................

mapr1 ......AAAGGTT

mapr2 AGGTT.......................

The above suggests that the N-stretch was a bit too short.

(Note: the example above are really just examples: for a couple of bases difference like shown, MIRA would probably find the right length of the N-stratch and add padding all by itself)

Hope that helps,

Bastien



--
Davide Sassera
Sezione di Patologia Generale e Parassitologia
Dipartimento di Patologia Animale, Igiene e Sanità Pubblica Veterinaria Facoltà di Veterinaria
Università degli Studi di Milano
Via Celoria 10, 20133, Milano, ITALY
Tel: +39 0250318094
Fax: +39 0250318095

Other related posts: