[mira_talk] Re: Problem with non-unique names?

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 13 Mar 2010 10:14:24 +0100

On Freitag 12 März 2010 Martin A. Hansen wrote:
> I get errors of the following type using Mira 3.0.2:
> 
> Error: read name FQIBXOY01A4H2K present multiple times in readpool!
> 
> Which is funny since there is only ONE sequence in the input with that
>  name.

Error introduced in 3.0.1. Please try out
  http://www.chevreux.org/tmp/mira_3.0.3_prod_linux-gnu_x86_64_static.tar.bz2

where the error should be fixed.

> I don't know why Mira considers non-unique names a problem unless these are
> used as hash keys?. In that case Mira's should perhaps assign a forth
> running ID number to reads, do the assemble magic, and upon finishing the
> output substitute the IDs with the original sequence names?

MIRA has no problem at all with duplicate names. It could do very well without 
names at all as it indeed uses internal ID to address the reads.

Programs reading ACE, CAF, MAF files however will be absolutely unhappy with 
duplicate names. Which is understandable as in all those formats, the general 
way to address reads and to place them is via their name ... and having 
duplicate names wreaks havoc with that logic.

That being said, I have yet to see a use case where having the same reads 
twice (or even the same read names twice with different data) would be 
something useful ... and not due to some handling error by the user.

For both reasons given above, checking for duplicate read names is a necessity 
rather than anything else :-)

Regards,
  Bastien

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: