[mira_talk] Re: Fwd: Re: Filtering (size or quality) for Ion Torrent data

*Hi Nicholas...*
*    If you have access to a Torrent Server v2.2, they have a very good
trimming algorithm to trim away low quality data. They employ the Beverly
lab filter which is supposed to be highly stringent. Secondly, if you would
like to use only reads longer than 150 bp, you can use the -mrl (minimum
read length) switch during assembly.*
*
*
*Cheers,
*
*
*
*Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University*
*Ph. +919790167534*
*
*
*I strongly believe in doing my best and leaving the rest to God*
*
*



On Fri, Aug 3, 2012 at 2:07 PM, Benjamin Leopold <
benjamin.leopold@xxxxxxxxxxxxxxx> wrote:

> Hi Nick,
>
> We have had an identical problem here with torrent resulting file size.
> As we often run on our local cluster which has a time limit, the
> assemblies get stopped when that ends.
>
> I've created a quick perl script that pulls out a random subset of
> sequences from the fastq file.  You can select the end percent to reduce
> the coverage to, e.g. 80%, etc.  The script is attached.
>
> Complete agreement that selecting on quality will be a significant
> improvement, but that one is still pending.
>
> There are several other tools that can relative qualities of a fastq
> file, such as Trim Galore.  I haven't tested it out yet, but that's also
> on my ToDo list.
>     http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/
>
> -Benjamin-
>
> --------------------------------
> Benjamin Leopold
> BioInformatiker
> benjamin.leopold@xxxxxxxxxxxxxxx
> Poliklinik für Parodontologie
> Universitätsklinikum Münster
> Waldeyerstraße 30, 48149 Münster
> T +49(0)251-83-49882
> --------------------------------
>
> On 08/03/2012 09:41 AM, Nicholas Heng wrote:
> >
> > Hi Bastien and other MIRA-using Folk,
> >
> > I have a dataset from an Ion Torrent 318 run that's about 480 Mb in size
> (average readlength 177 bp).  My underpowered 8 GB bioinformatics machine
> spent 7 solid days churning away at the data using MIRA 3.4 to no avail.
> >
> > Can any of you next-gen sequencing gurus please suggest a program that
> would allow me, one with absolutely no programming skill, to filter the
> dataset into say <150 bp and >150 bp subsets such that it's suitable for
> MIRA to handle?  Alternatively, a program that would sort by sequence
> quality... but this may be harder.
> >
> > The bacterial genome I'm sequencing is de novo (no reference) and it's
> about 2 - 2.5 Mb in size.  There is a 454 run being done but I'd like
> subsets of Ion Torrent for a hybrid assembly with MIRA.  And no, we can't
> afford a more accurate Illumina or SOLiD run at the present time.
> >
> > Any help is greatly appreciated.
> >
> > Cheers,
> > Nick.
> >
> >
> > ================================
> > Nicholas (Nick) C.K. Heng, Ph.D.
> > Department of Oral Sciences
> > Faculty of Dentistry
> > University of Otago
> > P.O. Box 647
> > Dunedin 9054
> > NEW ZEALAND.
> > Ph: +643 4799254
> > Fx: +643 4797078
> > ================================
> >
>
>

Other related posts: