[mira_talk] Re: mira bambus mira

From: Bastien Chevreux <bach@xxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Mon, 21 Mar 2011 20:13:42 +0100

On Monday 21 March 2011 16:04:10 Davide Sassera wrote:
> My problem, briefly, is: can I do mira --> bambus --> mira? is it
> useful? how should I do it?

Yes, that is certainly useful.

> More in detail here is the situation I'm facing:
> 
> I'm currently assemblying a 3.5 Mb genome from 1 solexa lane (28M reads)
> and 3/4 454 paired gs-flx plate (3kb library).

How long are the Solexas? 36, 75, 100? Although even with 36bp, that would be 
way overcovered. You might want to think about using only half or a third of 
the reads.

A good rule of thumb I use for hybrid de-novos: have the coverage of Solexa be 
approximately the same (or in the range of +/- 50%) as the one from 454. I 
guess your 454 coverage to be around 30-40x, so having 30x to 60x Solexa would 
already be good.

The numbers above are not just taken out of thin air: fluctuations in coverage 
of one sequencing technology are often compensated (or better: attenuated) by 
the other technology. If however the other technology is dominating in terms 
of reads, it will also dominate in terms of sequence dependent read coverage 
and void one of the good things a hynrid approach offers.

> I did a denovo assembly with the 454 and 5 M solexa reads (memory
> constraints).

With 75bp on Solexa that would be a ~90x coverage for Solexa ... probably to 
much already.

> [...]
> I now feel I could elongate the 84 contigs I have with a novel mapping,
> but I'm not sure on how to proceed:
> 
> Should I use the 84 contigs as they are and hope that a novel mapping
> adds the reads that will allow me to join them?
> 
> Or should I use the scaffold generated by bambus using print_scaff? This
> second option has the advantage of piloting the joins towards what
> bambus says, 

The second option I would take: take the bambus-generated scaffolds and map 
your 454 and, say, 3 to 5m reads to that.

> but bambus puts a stretch on Ns between two contigs. How
> does mira handle this? can mira use the Ns to join the contigs or not?

MIRA will not really join the contigs, but will happy map reads which overhang 
into the N-part. For small gaps, one round of mapping may already close most 
of them. You will however need to manually realign in gap4 if the estimated 
length of the N-stretch from Bambus was too far off the real number. Then 
rinse and repeat until you are happy with the result.

E.g.: light manual correction afte a mapping round:

backbone .............NNNNNNNNNN...................
mapr1           ......AAAGGTT
mapr2                      AGGTT...................

The above suggests that the N-stretch was a bit too long.

backbone .............NNNNNNNNNN...................
mapr1           ......AAAGGTT
mapr2                  AGGTT.......................

The above suggests that the N-stretch was a bit too short.

(Note: the example above are really just examples: for a couple of bases 
difference like shown, MIRA would probably find the right length of the N-
stratch and add padding all by itself)

Hope that helps,
  Bastien

Follow-Ups:
- [mira_talk] Re: mira bambus mira
  - From: Davide Sassera

References:
- [mira_talk] mira bambus mira
  - From: Davide Sassera

[mira_talk] Re: mira bambus mira

Other related posts: