[mira_talk] Re: Fwd: Re: Filtering (size or quality) for Ion Torrent data

  • From: Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 3 Aug 2012 08:45:11 -0500

There is also a Mira plug in for the Torrent Server but again that's
assuming you have the PGM and access to the server.  The resources on the
server should be sufficient for your assembly (the test run is E. coli ).

You might also want to check out Galaxy (http://galaxy.psu.edu/).  We run
an internal version but the public one also has a variety of tools for
working with and manipulating NGS data.  Try FastQC to get an idea of the
quality and where things could be tweaked.

Shaun






From:   Shankar Manoharan <shankarmanostar@xxxxxxxxx>
To:     mira_talk@xxxxxxxxxxxxx
Date:   2012-08-03 03:52 AM
Subject:        [mira_talk] Re: Fwd: Re: Filtering (size or quality) for Ion
            Torrent data
Sent by:        mira_talk-bounce@xxxxxxxxxxxxx



Hi Nicholas...
    If you have access to a Torrent Server v2.2, they have a very good
trimming algorithm to trim away low quality data. They employ the Beverly
lab filter which is supposed to be highly stringent. Secondly, if you would
like to use only reads longer than 150 bp, you can use the -mrl (minimum
read length) switch during assembly.

Cheers,

Shankar Manoharan
Graduate Student
Department of Genetics
Madurai Kamaraj University
Ph. +919790167534

I strongly believe in doing my best and leaving the rest to God




On Fri, Aug 3, 2012 at 2:07 PM, Benjamin Leopold <
benjamin.leopold@xxxxxxxxxxxxxxx> wrote:
  Hi Nick,

  We have had an identical problem here with torrent resulting file size.
  As we often run on our local cluster which has a time limit, the
  assemblies get stopped when that ends.

  I've created a quick perl script that pulls out a random subset of
  sequences from the fastq file.  You can select the end percent to reduce
  the coverage to, e.g. 80%, etc.  The script is attached.

  Complete agreement that selecting on quality will be a significant
  improvement, but that one is still pending.

  There are several other tools that can relative qualities of a fastq
  file, such as Trim Galore.  I haven't tested it out yet, but that's also
  on my ToDo list.
      http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/

  -Benjamin-

  --------------------------------
  Benjamin Leopold
  BioInformatiker
  benjamin.leopold@xxxxxxxxxxxxxxx
  Poliklinik für Parodontologie
  Universitätsklinikum Münster
  Waldeyerstraße 30, 48149 Münster
  T +49(0)251-83-49882
  --------------------------------

  On 08/03/2012 09:41 AM, Nicholas Heng wrote:
  >
  > Hi Bastien and other MIRA-using Folk,
  >
  > I have a dataset from an Ion Torrent 318 run that's about 480 Mb in
  size (average readlength 177 bp).  My underpowered 8 GB bioinformatics
  machine spent 7 solid days churning away at the data using MIRA 3.4 to no
  avail.
  >
  > Can any of you next-gen sequencing gurus please suggest a program that
  would allow me, one with absolutely no programming skill, to filter the
  dataset into say <150 bp and >150 bp subsets such that it's suitable for
  MIRA to handle?  Alternatively, a program that would sort by sequence
  quality... but this may be harder.
  >
  > The bacterial genome I'm sequencing is de novo (no reference) and it's
  about 2 - 2.5 Mb in size.  There is a 454 run being done but I'd like
  subsets of Ion Torrent for a hybrid assembly with MIRA.  And no, we can't
  afford a more accurate Illumina or SOLiD run at the present time.
  >
  > Any help is greatly appreciated.
  >
  > Cheers,
  > Nick.
  >
  >
  > ================================
  > Nicholas (Nick) C.K. Heng, Ph.D.
  > Department of Oral Sciences
  > Faculty of Dentistry
  > University of Otago
  > P.O. Box 647
  > Dunedin 9054
  > NEW ZEALAND.
  > Ph: +643 4799254
  > Fx: +643 4797078
  > ================================
  >

GIF image

Other related posts: