[mira_talk] Re: reducing number of Illumina reads

  • From: "Goldman, Thomas" <Thomas.Goldman@xxxxxx>
  • To: <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 21 Apr 2011 17:34:14 -0500

Thanks Egon. Yes, simple enough. Just thought I would see if there was
something better to do than just random removal of reads.

 

-Tom

 

From: mira_talk-bounce@xxxxxxxxxxxxx
[mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Egon Ozer
Sent: Thursday, April 21, 2011 3:28 PM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] Re: reducing number of Illumina reads

 

fastqSample distributed with the wgs-assembler package can subset your
fastq files for you.  I haven't checked to see if it picks random or
series (i.e. every 3rd or every 4th, etc) of reads.  

 

There's a little blurb about it at the bottom of this page:

http://sourceforge.net/apps/mediawiki/wgs-assembler/index.php?title=Fast
qToCA#fastqSample

 

Otherwise it wouldn't be too tough to write a quick perl script to
output every fifth read from your data set to another file.

 

- E

 

 

On Apr 21, 2011, at 5:08 PM, Robert Bruccoleri wrote:





However, there is a problem with taking this approach -- you could
significantly change the statistics for the determination of repetitive
regions and cause misassemblies as a result. 

Mira will generate coverage equivalent reads which can help in this
situation.

--Bob

ALLO (Alfredo Lopez De Leon) wrote: 

You can try to collapse your reads with the FASTX toolkit  this will
leave you with a set made of unique reads.

This method will preserve the sequence coverage and remove the
redundancy.

http://hannonlab.cshl.edu/fastx_toolkit/

 

AlLo

 

From: mira_talk-bounce@xxxxxxxxxxxxx
[mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Goldman, Thomas
Sent: Thursday, April 21, 2011 2:27 PM
To: mira_talk@xxxxxxxxxxxxx
Subject: [mira_talk] reducing number of Illumina reads

 

Hello all,

 

I have a 24GB RHEL5 machine on which I was able to do a de novo assembly
of 454 paired-end and fragment reads (~1.5 million reads). I also have
about 6 million 36bp Illumina reads.

 

I would like to:

Map the Illumina reads to the 454 backbone

Include the Illumina reads with the 454 reads for a de novo assembly

But I believe I don't have enough memory to handle all the Illumina
reads. I think my VM could handle maybe 20% of the Illumina reads. What
is the best way to reduce the Illumina reads used for the mapping and/or
the de novo assemblies? Would it be to just randomly pick 20% of the
reads out of the fastq file? Is there a tool out there I could use for
this?

 

Thanks,

Tom

 

<bruc.vcf>

 

Other related posts: