On Apr 19, 2012, at 15:34 , Ariel Amadio wrote: > I just tried to write a short email, not entering so many details, sorry. > We have real unpaired 454 data for this bacteria. We assembled those reads > and build approx. 100 contigs (big contigs). We then took those contigs and > joined together to form a "chimera" (just concatenation). This is because we > needed only one fasta sequence to simulate reads. I know this is not right... Uh ... this feels so wrong that it is giving me the creeps (sorry to be so blunt). I am not sure whether you would gain anything from this approach as there are so many unknowns in what you're doing. I do not recommend doing this, at all. > We then simulated paired-end reads for the chimera. Now, what I'm trying to > do is to see how many contigs and scaffolds we build using the real and > simulated data together. We want to do a paired-end 454 run, and we want to > have an idea of how many reads we need. > Hope to be more clear now. Much clearer. And I think it could be handled much easier if you simply took the size of the bacterium and planned for a 4x -10x coverage with paired end. Now, and this is way more important, you might also want to consider using different library sizes (like, one 3kb and one 7 to 10kb) to be able to scaffold across potentially larger rRNA stretches. Remember that sequencing itself is comparatively cheap nowadays: don't waste weeks or months trying to get something assembled / scaffolded with not enough paired-end if you can have the same thing done right in a day with a bit more money thrown at the sequencing. B. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html