There is also a Mira plug in for the Torrent Server but again that's assuming you have the PGM and access to the server. The resources on the server should be sufficient for your assembly (the test run is E. coli ). You might also want to check out Galaxy (http://galaxy.psu.edu/). We run an internal version but the public one also has a variety of tools for working with and manipulating NGS data. Try FastQC to get an idea of the quality and where things could be tweaked. Shaun From: Shankar Manoharan <shankarmanostar@xxxxxxxxx> To: mira_talk@xxxxxxxxxxxxx Date: 2012-08-03 03:52 AM Subject: [mira_talk] Re: Fwd: Re: Filtering (size or quality) for Ion Torrent data Sent by: mira_talk-bounce@xxxxxxxxxxxxx Hi Nicholas... If you have access to a Torrent Server v2.2, they have a very good trimming algorithm to trim away low quality data. They employ the Beverly lab filter which is supposed to be highly stringent. Secondly, if you would like to use only reads longer than 150 bp, you can use the -mrl (minimum read length) switch during assembly. Cheers, Shankar Manoharan Graduate Student Department of Genetics Madurai Kamaraj University Ph. +919790167534 I strongly believe in doing my best and leaving the rest to God On Fri, Aug 3, 2012 at 2:07 PM, Benjamin Leopold < benjamin.leopold@xxxxxxxxxxxxxxx> wrote: Hi Nick, We have had an identical problem here with torrent resulting file size. As we often run on our local cluster which has a time limit, the assemblies get stopped when that ends. I've created a quick perl script that pulls out a random subset of sequences from the fastq file. You can select the end percent to reduce the coverage to, e.g. 80%, etc. The script is attached. Complete agreement that selecting on quality will be a significant improvement, but that one is still pending. There are several other tools that can relative qualities of a fastq file, such as Trim Galore. I haven't tested it out yet, but that's also on my ToDo list. http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/ -Benjamin- -------------------------------- Benjamin Leopold BioInformatiker benjamin.leopold@xxxxxxxxxxxxxxx Poliklinik für Parodontologie Universitätsklinikum Münster Waldeyerstraße 30, 48149 Münster T +49(0)251-83-49882 -------------------------------- On 08/03/2012 09:41 AM, Nicholas Heng wrote: > > Hi Bastien and other MIRA-using Folk, > > I have a dataset from an Ion Torrent 318 run that's about 480 Mb in size (average readlength 177 bp). My underpowered 8 GB bioinformatics machine spent 7 solid days churning away at the data using MIRA 3.4 to no avail. > > Can any of you next-gen sequencing gurus please suggest a program that would allow me, one with absolutely no programming skill, to filter the dataset into say <150 bp and >150 bp subsets such that it's suitable for MIRA to handle? Alternatively, a program that would sort by sequence quality... but this may be harder. > > The bacterial genome I'm sequencing is de novo (no reference) and it's about 2 - 2.5 Mb in size. There is a 454 run being done but I'd like subsets of Ion Torrent for a hybrid assembly with MIRA. And no, we can't afford a more accurate Illumina or SOLiD run at the present time. > > Any help is greatly appreciated. > > Cheers, > Nick. > > > ================================ > Nicholas (Nick) C.K. Heng, Ph.D. > Department of Oral Sciences > Faculty of Dentistry > University of Otago > P.O. Box 647 > Dunedin 9054 > NEW ZEALAND. > Ph: +643 4799254 > Fx: +643 4797078 > ================================ >