[mira_talk] Re: 0.5TB not enough space?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 15 Aug 2011 21:25:49 +0200

On Monday 15 August 2011 21:12:40 Iddo Friedberg wrote:
> Many repeats in Mycoplasma, unfortunately. Also, many HGTs, so k-mer
> frequencies may not be as uniform as in other bugs. But thanks for turning
> my attention to this, I will try running the assembly with some nasty
> repeat masking.

Nope, repeats are not your primary concern. The histogram is pretty healthy 
regarding that ... I'd expect the usual rRNA repeats and maybe a couple of 
highly repetitive short stretches which will probably break contig building at 
a couple of places, but nothing really too wild I would say.

Run the first assembly n the reduced data set without repeat masking and 
decide then on the resulting histogram and the repeat info file what to do 
next.

> @Bastien "Heavens! Is there any valid reason you set up an assembly with a
> coverage >= 1000x?"
> 
> Not one which I would like to put down in writing ;)

He he :-)

Fortunately the remedy is simple: "head -somenumber input.fastq >output.fastq" 
will be your friend here.

B.

Other related posts: