[mira_talk] Re: MIRA vs Newbler (454 est's)

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 25 Jul 2011 23:02:54 +0200

On Jul 25, 2011, at 17:19 , Diogo Santos wrote:
> I have received data from an EST's project from 454. they make the assembly 
> with Newbler 2.5 and they get 5869 contigs (length>100) with coverage 11x (I 
> know it's low, but it's what I get:( ). I try to use MIRA to make some 
> testing and use the original data to reassemble, but I get a strange result 
> (19378 contigs with length >100 and coverage 4x). Can you teel wich 
> parameters should I change?

The number of contigs in an EST assembly can vary widely depending on several 
factors.

First, make absolutely sure that he data you got in the SFF is preprocessed 
correctly by the sequencing provider. I've seen just lately a data set where a 
provider played around with the adaptors but did not tell the Roche 
post-processing pipeline about it. The led to 1/3 of the reads still having 
adaptor unclipped i the SFF ... and that is deadly.

Also, be really sure that MIDs are clipped away. Again, this is something the 
provider normally should do as they *are* responsible to deliver correct data 
as free as possible from sequencing artefacts.

Once you are sure your data is good, it's time to think about biological 
explanations: is you organism multiploid with many differences between alleles? 
If yes, then many people are taken by surprise that MIRA doesn't assemble that 
together. MIRA is NOT a clusterer! It is an assembler and as such, it will 
assemble the mRNA as it was in the cell.

Hope that helps,
  Bastien

Other related posts: