[mira_talk] Re: mixing real and simulated 454 data

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 19 Apr 2012 16:17:35 +0200

On Apr 19, 2012, at 15:34 , Ariel Amadio wrote:
> I just tried to write a short email, not entering so many details, sorry.
> We have real unpaired 454 data for this bacteria. We assembled those reads 
> and build approx. 100 contigs (big contigs). We then took those contigs and 
> joined together to form a "chimera" (just concatenation). This is because we 
> needed only one fasta sequence to simulate reads. I know this is not right...

Uh ... this feels so wrong that it is giving me the creeps (sorry to be so 
blunt). I am not sure whether you would gain anything from this approach as 
there are so many unknowns in what you're doing. I do not recommend doing this, 
at all.

> We then simulated paired-end reads for the chimera. Now, what I'm trying to 
> do is to see how many contigs and scaffolds we build using the real and 
> simulated data together. We want to do a paired-end 454 run, and we want to 
> have an idea of how many reads we need.
> Hope to be more clear now.

Much clearer. And I think it could be handled much easier if you simply took 
the size of the bacterium and planned for a 4x -10x coverage with paired end. 
Now, and this is way more important, you might also want to consider using 
different library sizes (like, one 3kb and one 7 to 10kb) to be able to 
scaffold across potentially larger rRNA stretches.

Remember that sequencing itself is comparatively cheap nowadays: don't waste 
weeks or months trying to get something assembled / scaffolded with not enough 
paired-end if you can have the same thing done right in a day with a bit more 
money thrown at the sequencing.

B.



--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: