Hi Michael, I'm very curious to hear how things go for you. I am doing a project with a mammalian transcriptome, differential expression from 18 samples. We have 3.7 million 454 reads I have assembled with MIRA, and very soon we should get another 60 million reads from an Illumina run. So overall its very similar to what you are doing. -Marshall Hampton On Mon, Nov 1, 2010 at 4:35 PM, Wachholtz, Michael <mwachholtz@xxxxxxxxxxx> wrote: > I think that is the pipeline I will use. Assemble solexa data using > the velvet/oases package. I like this because it is fast and one of > the few solexa transcriptome assemblers, everything else seems catered > to genomic assemblies. The only thing I dislike is that no consensus > quality sequence is output. So if I fragment these contigs and turn > them into 454 pseudo-reads, they will have no quality scores. I'm > working with 2 strains, one is hexaploid and another tetraploid. I > fear I am swimming into deep dark waters, but hoping that MIRA will > help me to identify the majority of inter & intra organism SNPs. I > would like to catch indels greater than 3bp also. Does anyone know how > to tweek the MIRA 454 parameters to help catch indels but also deal > with sequencing error/homopolymer issues in 454 reads? > > On Mon, Nov 1, 2010 at 3:51 PM, Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx> > wrote: >> --Hi, >> >> personally i use Velveth & velvetg to assemble Solexa reads because is very >> faster, >> it takes ~ 6 hours to assemble 30 millon of 75bp reads. >> Then i use Mira to re-assemble the contig provided by Velvet. >> >> >> Laurent -- >> >> >> >> >> Wachholtz, Michael a écrit : >>> >>> I am currently trying to do a hybrid transcriptome assembly with both >>> 454 and Solexa reads, which will lead to an eventual RNA-Seq analysis. >>> The research is regarding 2 strains of buffalograss, one which is >>> resistant to cinch bugs ( tetraploid) and another that is suspectible >>> to cinch bugs (hexaploid). We have 5 half-plate runs of 454 data ( >>> ~400,000 reads/run) and 11 lane runs of solexa data (each lane >>> producing 30millon 55bp reads). Our best computer is quad-core with >>> 1.5terabyte HD and 25GB RAM. >>> My questions regard making the best hybrid assembly with this data, >>> and flagging inter & intra organism SNPs also. >>> I have seen two methods described with mira. The first being that we >>> could assemble each solexa lane separately ( I think our RAM can only >>> handle 1 lane assembly at a time) then break the assembled contigs & >>> unassembled reads into 454 pseudo-reads. Then combine with 454 reads >>> and assemble with 454 settings. My questions regarding this are: how >>> would we fragment the solexa contigs into pseudo reads for 454? Do I >>> just break the contigs into 500bp chunks? Do I need to adjust the >>> quality scores since solexa uses a different scoring scheme? Also, >>> since it is so computationally expensive to assemble solexa with mira >>> (we are assembling 1 lane currently, and is already at the 24hr >>> mark...still running), is there another fast and ACCURATE solxea >>> assembly program that will produce contigs WITH quality scores? I've >>> tried abyss, but can't figure out how to get a consensus quality score >>> file to output for each contig. >>> >>> The next method I've seen described is to assemble the 454 reads and >>> use them as a backbone to map/assemble the solexa reads (which would >>> be less expensive in contrast to assembling solexa runs without a >>> backbone, as in the above method). If I do this, will it be able to >>> extend/improve/join the 454 contigs/singletons I already have? Will >>> these improved contigs show up in the output files? My plan would take >>> an iterative approach, trying to extend/join contigs with each solexa >>> run. Since I have 11 solexa datasets, I would assemble these to the >>> backbone one at a time (what my RAM permits), but with each iteration >>> I would want the backbone to improve and also include >>> leftover(unassembled) solexa reads from the previous iteration. The >>> only problem I see with this is that the output will only include >>> sequences comprised of solexa reads? In the next iteration I will want >>> to include the same 454 contigs/singletons and the new solexa novel >>> contigs/unassembled reads, as well as 454 contigs that were joined or >>> extended. This would require me merging the dataset somehow, having to >>> filter what has been mapped and unmapped to remove redundant >>> sequences. Correct? I also assume this would make it more difficult to >>> catch SNPs (which isn't a problem because I can always use SAMTools in >>> the RNA-Seq analysis to catch SNPs through the solexa reads) >>> >>> Has anyone tried one of these methods or prefers a particular one, and >>> can share the details/problems? >>> >>> >> >> >> -- >> You have received this mail because you are subscribed to the mira_talk >> mailing list. For information on how to subscribe or unsubscribe, please >> visit http://www.chevreux.org/mira_mailinglists.html >> > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html