I recently discovered that Abyss has trans-abyss package. While this program is geared towards expression analysis with reference genome, it has a merge.pl script. We have have been on the fence about what k-mer value to use. The results are hard to interpret, but this package will do assembly of all values k-mer from i/2 to i (where i is the read length) and merge all the contigs into a final assembly. Our computer is quad-core with 25GB RAM. It only takes Abyss less than 1hour to assemble ~100,000,000 reads. Very fast!! Since all of our illumina reads were filtered to contain mostly 30+ quality scores, we just run this assembly through MIRA's fasta2frag program. This will output quality score file for the fragments, putting in a value of 30 for each bp (saves me the work of writing a script for this). Then just treat the fragments as sanger reads and do hybrid with our 454 reads in MIRA. If anyone has done Illumina transcriptome assembly with the velvet/oases package instead of abyss, I would like to hear your thoughts about the advantages or technique you used. While abyss seems to do a fine job of catching SNPs and logging them as "popped bubbles", I'm not sure how it handles indels & transcript variants. Once we have a complete assembly, our goal is to do RNA-Seq analysis with the original Illumina data. While MIRA will catch a large majority of SNPs during assembly, some of the SNP/variation data will have been lost in the abyss assembly. However this "lost" information can easily be found when we map reads using bowtie, bam/sam tools. On Tue, Nov 16, 2010 at 2:55 PM, Sven Klages <sir.svencelot@xxxxxxxxxxxxxx> wrote: > oh, yes. I see, .. I just wanted to use it for my own data and was quite > astonished ;-) > fasta output, no qualities ... not of any use for me neither .. > cheers, > Sven > > 2010/11/16 Wachholtz, Michael <mwachholtz@xxxxxxxxxxx> >> >> I have, but the output is in fasta format with no quality scores. The >> only advantage this program has is that it will output how many >> identical reads there were. I prefer the fastq program in that it will >> retain the quality score of best sequence and will output in fastq >> format. >> >> On Mon, Nov 15, 2010 at 5:18 AM, Sven Klages >> <sir.svencelot@xxxxxxxxxxxxxx> wrote: >> > Hi Michael, >> > >> > 2010/11/15 Wachholtz, Michael <mwachholtz@xxxxxxxxxxx> >> >> >> >> [...] >> >> >> >> it is safe to use such strict criteria. After that, for each lane, we >> >> used the fastq program to collapse/remove any identical reads. This >> > >> > [...] >> > >> > just a short question. You have successfuly used the FASTX-Toolkit to >> > quality-clip your data; >> > this tool collection also contains a program to remove duplicates from >> > NGS >> > data: >> > >> > FASTQ/A Collapser >> > Collapsing identical sequences in a FASTQ/A file into a single sequence >> > (while maintaining reads counts) >> > >> > Have you tried this for your data? >> > >> > cheers, >> > Sven >> > >> > >> >> -- >> You have received this mail because you are subscribed to the mira_talk >> mailing list. For information on how to subscribe or unsubscribe, please >> visit http://www.chevreux.org/mira_mailinglists.html > > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html