--Hi,personally i use Velveth & velvetg to assemble Solexa reads because is very faster,
it takes ~ 6 hours to assemble 30 millon of 75bp reads. Then i use Mira to re-assemble the contig provided by Velvet. Laurent -- Wachholtz, Michael a écrit :
I am currently trying to do a hybrid transcriptome assembly with both 454 and Solexa reads, which will lead to an eventual RNA-Seq analysis. The research is regarding 2 strains of buffalograss, one which is resistant to cinch bugs ( tetraploid) and another that is suspectible to cinch bugs (hexaploid). We have 5 half-plate runs of 454 data ( ~400,000 reads/run) and 11 lane runs of solexa data (each lane producing 30millon 55bp reads). Our best computer is quad-core with 1.5terabyte HD and 25GB RAM. My questions regard making the best hybrid assembly with this data, and flagging inter & intra organism SNPs also. I have seen two methods described with mira. The first being that we could assemble each solexa lane separately ( I think our RAM can only handle 1 lane assembly at a time) then break the assembled contigs & unassembled reads into 454 pseudo-reads. Then combine with 454 reads and assemble with 454 settings. My questions regarding this are: how would we fragment the solexa contigs into pseudo reads for 454? Do I just break the contigs into 500bp chunks? Do I need to adjust the quality scores since solexa uses a different scoring scheme? Also, since it is so computationally expensive to assemble solexa with mira (we are assembling 1 lane currently, and is already at the 24hr mark...still running), is there another fast and ACCURATE solxea assembly program that will produce contigs WITH quality scores? I've tried abyss, but can't figure out how to get a consensus quality score file to output for each contig. The next method I've seen described is to assemble the 454 reads and use them as a backbone to map/assemble the solexa reads (which would be less expensive in contrast to assembling solexa runs without a backbone, as in the above method). If I do this, will it be able to extend/improve/join the 454 contigs/singletons I already have? Will these improved contigs show up in the output files? My plan would take an iterative approach, trying to extend/join contigs with each solexa run. Since I have 11 solexa datasets, I would assemble these to the backbone one at a time (what my RAM permits), but with each iteration I would want the backbone to improve and also include leftover(unassembled) solexa reads from the previous iteration. The only problem I see with this is that the output will only include sequences comprised of solexa reads? In the next iteration I will want to include the same 454 contigs/singletons and the new solexa novel contigs/unassembled reads, as well as 454 contigs that were joined or extended. This would require me merging the dataset somehow, having to filter what has been mapped and unmapped to remove redundant sequences. Correct? I also assume this would make it more difficult to catch SNPs (which isn't a problem because I can always use SAMTools in the RNA-Seq analysis to catch SNPs through the solexa reads) Has anyone tried one of these methods or prefers a particular one, and can share the details/problems?
-- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html