[mira_talk] Re: Metagenome assembly

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 2 May 2012 21:27:31 +0200

OK, a couple of points have gone through this thread where I'd like to a say 
few things:

Re read name length:
MIRA has no limit regarding the length of a read name, but some (older) tools 
downstream may have. This is why MIRA per default stops with a warning when it 
encounters read names which are long. But there's, of course, a switch to tell 
it to ignore and continue.
But what I often do is rename the reads to some shorter names anyway ... long 
names unnecessarily eat RAM.

Re paired end naming scheme (pre-CASAVA 1.8): 
True, for MIRA 3.4 you need to convert the read names so that MIRA recognizes 
that they form pairs. One solution was posted, but there are others like a 
simple sed one-liner. Alternatively you can use the development version of MIRA 
(3.9.x) which knows hoe to deal with CASAVA 1.8 style naming.

Re mapping against human genome:
very good idea, I recommend doing that. MIRA would probably be able to deal 
with these reads, but it reduces the amount of data which needs to be looked at 
(every bit counts). You might want to be strict though in the mapping: not more 
than two mismatches or similar.

Re misassemblies:
I suspect there will be some. However, for a first rough gene catalogue I would 
not bother too much: most things will be right.


One more thing I thought about today: MIRA has a pretty good clipping of bad 
quality data based on kmers (called "proposed end clipping", -CL:pec). However, 
this might interfere with metagenome assemblies where I suspect that many of 
the organisms will be only lowly covered. You might want to test with and 
without -CL:pec.

B.
--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: