[mira_talk] Re: large hybrid assembly w/ minimal ram

From: Sven Klages <sir.svencelot@xxxxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Tue, 30 Nov 2010 20:37:56 +0100

Oh, Michael has mentioned trans ... I never used it :-)

cheers,
Sven

2010/11/29 Robin Kramer <kodream@xxxxxxxxx>

> Sven,
>
> The problem is that bowtie itself has only limited support for indels
> since it isn't a true SW aligner, and Abyss in its scaffolding stage
> doesn't support indels(even if bowtie generates them), whatsoever.
>
>
> I am curious as too your experience with the trans package.  Did it do
> a good job?  The last I checked it was something akin to an NxN blast
> search and required quite a bit of external configuration to use, and
> since it was only a perl script I was guessing that it was itself
> quite slow.
>
>
> On 11/16/10, Wachholtz, Michael <mwachholtz@xxxxxxxxxxx> wrote:
> > I recently discovered that Abyss has trans-abyss package. While this
> > program is geared towards expression analysis with reference genome,
> > it has a merge.pl script. We have have been on the fence about what
> > k-mer value to use. The results are hard to interpret, but this
> > package will do assembly of all values k-mer from i/2 to i (where i is
> > the read length) and merge all the contigs into a final assembly. Our
> > computer is quad-core with 25GB RAM. It only takes Abyss less than
> > 1hour to assemble ~100,000,000 reads. Very fast!! Since all of our
> > illumina reads were filtered to contain mostly 30+ quality scores, we
> > just run this assembly through MIRA's fasta2frag program. This will
> > output quality score file for the fragments, putting in a value of 30
> > for each bp (saves me the work of writing a script for this). Then
> > just treat the fragments as sanger reads and do hybrid with our 454
> > reads in MIRA. If anyone has done Illumina transcriptome assembly with
> > the velvet/oases package instead of abyss, I would like to hear your
> > thoughts about the advantages or technique you used. While abyss seems
> > to do a fine job of catching SNPs and logging them as "popped
> > bubbles", I'm not sure how it handles indels & transcript variants.
> > Once we have a complete assembly, our goal is to do RNA-Seq analysis
> > with the original Illumina data. While MIRA will catch a large
> > majority of SNPs during assembly, some of the SNP/variation data will
> > have been lost in the abyss assembly. However this "lost" information
> > can easily be found when we map reads using bowtie, bam/sam tools.
> >
> > On Tue, Nov 16, 2010 at 2:55 PM, Sven Klages
> > <sir.svencelot@xxxxxxxxxxxxxx> wrote:
> >> oh, yes. I see, .. I just wanted to use it for my own data and was quite
> >> astonished ;-)
> >> fasta output, no qualities ... not of any use for me neither ..
> >> cheers,
> >> Sven
> >>
> >> 2010/11/16 Wachholtz, Michael <mwachholtz@xxxxxxxxxxx>
> >>>
> >>> I have, but the output is in fasta format with no quality scores. The
> >>> only advantage this program has is that it will output how many
> >>> identical reads there were. I prefer the fastq program in that it will
> >>> retain the quality score of best sequence and will output in fastq
> >>> format.
> >>>
> >>> On Mon, Nov 15, 2010 at 5:18 AM, Sven Klages
> >>> <sir.svencelot@xxxxxxxxxxxxxx> wrote:
> >>> > Hi Michael,
> >>> >
> >>> > 2010/11/15 Wachholtz, Michael <mwachholtz@xxxxxxxxxxx>
> >>> >>
> >>> >> [...]
> >>> >>
> >>> >> it is safe to use such strict criteria. After that, for each lane,
> we
> >>> >> used the fastq program to collapse/remove any identical reads. This
> >>> >
> >>> > [...]
> >>> >
> >>> > just a short question. You have successfuly used the FASTX-Toolkit to
> >>> > quality-clip your data;
> >>> > this tool collection also contains a program to remove duplicates
> from
> >>> > NGS
> >>> > data:
> >>> >
> >>> > FASTQ/A Collapser
> >>> > Collapsing identical sequences in a FASTQ/A file into a single
> sequence
> >>> > (while maintaining reads counts)
> >>> >
> >>> > Have you tried this for your data?
> >>> >
> >>> > cheers,
> >>> > Sven
> >>> >
> >>> >
> >>>
> >>> --
> >>> You have received this mail because you are subscribed to the mira_talk
> >>> mailing list. For information on how to subscribe or unsubscribe,
> please
> >>> visit http://www.chevreux.org/mira_mailinglists.html
> >>
> >>
> >
> > --
> > You have received this mail because you are subscribed to the mira_talk
> > mailing list. For information on how to subscribe or unsubscribe, please
> > visit http://www.chevreux.org/mira_mailinglists.html
> >
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

References:
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Laurent MANCHON
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Wachholtz, Michael
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Marshall Hampton
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Wachholtz, Michael
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Sven Klages
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Wachholtz, Michael
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Sven Klages
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Wachholtz, Michael
- [mira_talk] Re: large hybrid assembly w/ minimal ram
  - From: Robin Kramer

[mira_talk] Re: large hybrid assembly w/ minimal ram

Other related posts: