[mira_talk] Re: large hybrid assembly w/ minimal ram

  • From: "Wachholtz, Michael" <mwachholtz@xxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 30 Nov 2010 21:14:12 -0600

I am still configuring trans-abyss, and yes it is not user friendly.
Our illumina reads are single end, so many of the steps in trans-abyss
are skipped. We are only using trans-abyss to merge our multi-k-mer
assembly to remove redundant contigs. I realize abyss will won't catch
indels well, but we are only using it to help make our 454 assembly
better. Since we have no reference genome, we sequenced a normalized
transcriptome via 454. Then did non-normalized sequencing with
Illumina. We are merely assembling the Illumina reads and hoping that
they will close some gaps & join contigs in our 454 assembly.

On Mon, Nov 29, 2010 at 2:00 PM, Robin Kramer <kodream@xxxxxxxxx> wrote:
> Sven,
>
> The problem is that bowtie itself has only limited support for indels
> since it isn't a true SW aligner, and Abyss in its scaffolding stage
> doesn't support indels(even if bowtie generates them), whatsoever.
>
>
> I am curious as too your experience with the trans package.  Did it do
> a good job?  The last I checked it was something akin to an NxN blast
> search and required quite a bit of external configuration to use, and
> since it was only a perl script I was guessing that it was itself
> quite slow.
>
>
> On 11/16/10, Wachholtz, Michael <mwachholtz@xxxxxxxxxxx> wrote:
>> I recently discovered that Abyss has trans-abyss package. While this
>> program is geared towards expression analysis with reference genome,
>> it has a merge.pl script. We have have been on the fence about what
>> k-mer value to use. The results are hard to interpret, but this
>> package will do assembly of all values k-mer from i/2 to i (where i is
>> the read length) and merge all the contigs into a final assembly. Our
>> computer is quad-core with 25GB RAM. It only takes Abyss less than
>> 1hour to assemble ~100,000,000 reads. Very fast!! Since all of our
>> illumina reads were filtered to contain mostly 30+ quality scores, we
>> just run this assembly through MIRA's fasta2frag program. This will
>> output quality score file for the fragments, putting in a value of 30
>> for each bp (saves me the work of writing a script for this). Then
>> just treat the fragments as sanger reads and do hybrid with our 454
>> reads in MIRA. If anyone has done Illumina transcriptome assembly with
>> the velvet/oases package instead of abyss, I would like to hear your
>> thoughts about the advantages or technique you used. While abyss seems
>> to do a fine job of catching SNPs and logging them as "popped
>> bubbles", I'm not sure how it handles indels & transcript variants.
>> Once we have a complete assembly, our goal is to do RNA-Seq analysis
>> with the original Illumina data. While MIRA will catch a large
>> majority of SNPs during assembly, some of the SNP/variation data will
>> have been lost in the abyss assembly. However this "lost" information
>> can easily be found when we map reads using bowtie, bam/sam tools.
>>
>> On Tue, Nov 16, 2010 at 2:55 PM, Sven Klages
>> <sir.svencelot@xxxxxxxxxxxxxx> wrote:
>>> oh, yes. I see, .. I just wanted to use it for my own data and was quite
>>> astonished ;-)
>>> fasta output, no qualities ... not of any use for me neither ..
>>> cheers,
>>> Sven
>>>
>>> 2010/11/16 Wachholtz, Michael <mwachholtz@xxxxxxxxxxx>
>>>>
>>>> I have, but the output is in fasta format with no quality scores. The
>>>> only advantage this program has is that it will output how many
>>>> identical reads there were. I prefer the fastq program in that it will
>>>> retain the quality score of best sequence and will output in fastq
>>>> format.
>>>>
>>>> On Mon, Nov 15, 2010 at 5:18 AM, Sven Klages
>>>> <sir.svencelot@xxxxxxxxxxxxxx> wrote:
>>>> > Hi Michael,
>>>> >
>>>> > 2010/11/15 Wachholtz, Michael <mwachholtz@xxxxxxxxxxx>
>>>> >>
>>>> >> [...]
>>>> >>
>>>> >> it is safe to use such strict criteria. After that, for each lane, we
>>>> >> used the fastq program to collapse/remove any identical reads. This
>>>> >
>>>> > [...]
>>>> >
>>>> > just a short question. You have successfuly used the FASTX-Toolkit to
>>>> > quality-clip your data;
>>>> > this tool collection also contains a program to remove duplicates from
>>>> > NGS
>>>> > data:
>>>> >
>>>> > FASTQ/A Collapser
>>>> > Collapsing identical sequences in a FASTQ/A file into a single sequence
>>>> > (while maintaining reads counts)
>>>> >
>>>> > Have you tried this for your data?
>>>> >
>>>> > cheers,
>>>> > Sven
>>>> >
>>>> >
>>>>
>>>> --
>>>> You have received this mail because you are subscribed to the mira_talk
>>>> mailing list. For information on how to subscribe or unsubscribe, please
>>>> visit http://www.chevreux.org/mira_mailinglists.html
>>>
>>>
>>
>> --
>> You have received this mail because you are subscribed to the mira_talk
>> mailing list. For information on how to subscribe or unsubscribe, please
>> visit http://www.chevreux.org/mira_mailinglists.html
>>
>
> --
> You have received this mail because you are subscribed to the mira_talk 
> mailing list. For information on how to subscribe or unsubscribe, please 
> visit http://www.chevreux.org/mira_mailinglists.html
>

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: