[mira_talk] Re: should this assembly be taking several days?

  • From: Bharat Patel <b.patel@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 04 Jul 2010 09:45:44 +1000

Hi Corey

After posting my earlier response to your question on running mira on small microbial genomes, I saw that the post has moved on to gsAssembler and mira.

I had run gsAssembler on my the small genome (8 kb paired end and fragment libraries) with depending on the changes to the default settings obtained between 30 and 40 scaffolds. After performing blast analysis on these scaffolds and than comparing the organisation of the genes with its nearest neighbors, I realised that the break points were always around tRNA and rRNA genes. There are many rRNA operons repeated in microbial genomes but in my analysis I could only find a single 16S rRNA and a partial 23S rRNA gene and the break points were around the tRNA genes. I remembered reading posts from users who suggested that repeats may not be handled well by gsAssembler. From my understanding an update of gsAssembler will be out to address the issue of repeats.

Mira on the other hand produced produced 1245 contigs and with Bambus one large scaffold (and small less than 1000bp assemblages) with the expected genome size and had 4 rRNA gene operons. I have selected PCR priimer pairs from the mira scaffold and compared the primers picked with consed which I had run with the ace output from gsAssembler and with the exception of some primers, most matched. So in essence, I like what mira does at least for small microbial genomes which do not have many repeats (not sure about genomes with many repeats though). Once you have a handle on mira, it is a breeze to use and understand (mostly). I found bambus a bit problematic especially the structure of mates file but after many tries got the mates file to work correctly.

Thanks to the mira users who respond to questions and motivate first time users. My sincere thanks and appreciation to Bastein for his work on mira and especially for his quick response to questions posted by the users. It is only because of this I have spent many months persisting with using mira.

Bharat
Professor,
Microbial Gene Research & Resources Facility (in Extremophiles)
School of Biomolecular & Physical Sciences
Griffith University
Brisbane, Australia


mira_talk-bounce@xxxxxxxxxxxxx wrote:
Thanks Bastien (and Bharat)

I'll increase the ram and give it another shot.
I'm not sure how repetitive my bug is relative to others, but I did
notice that there were a few small contigs in the gsAssembler output
that were several times the average coverage.
Cheers

Corey

On Sat, 2010-07-03 at 01:22 +0200, Bastien Chevreux wrote:
On Freitag 02 Juli 2010 Corey Frazer wrote:
[...]
So, has something gone off the rails here, or am I just going to have to
let it run?
I think the problem is RAM: the machine is swapping itself to death, Obviously the miramem estimate was a bit wrong in this special case ... perhaps more repeats than "normal" in your genome?

If you could find a machine with 8 or better 12 GiB, I think you'd see MIRA finishing within a couple of hours.

Also, I suppose you are using 3.0.5. Try 3.1.15 (development version) which should use perhaps 20% less memory than the ... 8 GIB or so your process currently needs.

 http://www.chevreux.org/tmp/mira_3.1.15_dev_linux-gnu_x86_64_static.tar.bz2

Or wait for after the week-end, when 3.2.0rc1 will be released.

Regards,
  Bastien




Other related posts: