I think that is the pipeline I will use. Assemble solexa data using the velvet/oases package. I like this because it is fast and one of the few solexa transcriptome assemblers, everything else seems catered to genomic assemblies. The only thing I dislike is that no consensus quality sequence is output. So if I fragment these contigs and turn them into 454 pseudo-reads, they will have no quality scores. I'm working with 2 strains, one is hexaploid and another tetraploid. I fear I am swimming into deep dark waters, but hoping that MIRA will help me to identify the majority of inter & intra organism SNPs. I would like to catch indels greater than 3bp also. Does anyone know how to tweek the MIRA 454 parameters to help catch indels but also deal with sequencing error/homopolymer issues in 454 reads? On Mon, Nov 1, 2010 at 3:51 PM, Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx> wrote: > --Hi, > > personally i use Velveth & velvetg to assemble Solexa reads because is very > faster, > it takes ~ 6 hours to assemble 30 millon of 75bp reads. > Then i use Mira to re-assemble the contig provided by Velvet. > > > Laurent -- > > > > > Wachholtz, Michael a écrit : >> >> I am currently trying to do a hybrid transcriptome assembly with both >> 454 and Solexa reads, which will lead to an eventual RNA-Seq analysis. >> The research is regarding 2 strains of buffalograss, one which is >> resistant to cinch bugs ( tetraploid) and another that is suspectible >> to cinch bugs (hexaploid). We have 5 half-plate runs of 454 data ( >> ~400,000 reads/run) and 11 lane runs of solexa data (each lane >> producing 30millon 55bp reads). Our best computer is quad-core with >> 1.5terabyte HD and 25GB RAM. >> My questions regard making the best hybrid assembly with this data, >> and flagging inter & intra organism SNPs also. >> I have seen two methods described with mira. The first being that we >> could assemble each solexa lane separately ( I think our RAM can only >> handle 1 lane assembly at a time) then break the assembled contigs & >> unassembled reads into 454 pseudo-reads. Then combine with 454 reads >> and assemble with 454 settings. My questions regarding this are: how >> would we fragment the solexa contigs into pseudo reads for 454? Do I >> just break the contigs into 500bp chunks? Do I need to adjust the >> quality scores since solexa uses a different scoring scheme? Also, >> since it is so computationally expensive to assemble solexa with mira >> (we are assembling 1 lane currently, and is already at the 24hr >> mark...still running), is there another fast and ACCURATE solxea >> assembly program that will produce contigs WITH quality scores? I've >> tried abyss, but can't figure out how to get a consensus quality score >> file to output for each contig. >> >> The next method I've seen described is to assemble the 454 reads and >> use them as a backbone to map/assemble the solexa reads (which would >> be less expensive in contrast to assembling solexa runs without a >> backbone, as in the above method). If I do this, will it be able to >> extend/improve/join the 454 contigs/singletons I already have? Will >> these improved contigs show up in the output files? My plan would take >> an iterative approach, trying to extend/join contigs with each solexa >> run. Since I have 11 solexa datasets, I would assemble these to the >> backbone one at a time (what my RAM permits), but with each iteration >> I would want the backbone to improve and also include >> leftover(unassembled) solexa reads from the previous iteration. The >> only problem I see with this is that the output will only include >> sequences comprised of solexa reads? In the next iteration I will want >> to include the same 454 contigs/singletons and the new solexa novel >> contigs/unassembled reads, as well as 454 contigs that were joined or >> extended. This would require me merging the dataset somehow, having to >> filter what has been mapped and unmapped to remove redundant >> sequences. Correct? I also assume this would make it more difficult to >> catch SNPs (which isn't a problem because I can always use SAMTools in >> the RNA-Seq analysis to catch SNPs through the solexa reads) >> >> Has anyone tried one of these methods or prefers a particular one, and >> can share the details/problems? >> >> > > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html