[mira_talk] small contigs in first pass

  • From: "Reith, Michael" <Michael.Reith@xxxxxxxxxxxxxx>
  • To: <mira_talk@xxxxxxxxxxxxx>
  • Date: Tue, 12 Jan 2010 20:09:00 -0500

Hi Bastien and everyone,

I'm doing an assembly of a lower eukaryote (genome ~35 Mb) using 454 sequences 
and 76 bp Illumina reads (~1.2M & 20M sequences, respectively).  Mira is just 
in the first pass through the data, but has been writing the contigs of the 
*_out_pass1.caf for more than a day now.  The first 3500 or so contigs look to 
be useful (>500 bp, something approaching the expected coverage), but since 
that point the vast majority of the contigs are short with low coverage and 
recently, they're mostly 2 Illumina reads.  I'm now past contig 60000 and there 
still appears to be a long way to go (>1.4M unused reads...= 700,000 2 read 
contigs?).  I'm wondering if there's a command line switch I can use to avoid 
the generation of these small, probably useless contigs during the mira run (I 
know they can be filtered out afterward).  Or should I just use a half or a 
quarter of the Illumina reads in doing the assembly?  Any help or advice would 
be appreciated.

Thanks,
Mike

Other related posts: