[mira_talk] Re: Problem with convert_project

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 28 Oct 2012 09:43:56 +0100

On 10/26/2012 09:14 PM, Steven Sijmons wrote:
[...]
When I then compared the number of reads in the fastq file I created with convert_project, I noticed that these contained more reads than the number of lines present in the debris_list. Debris_lists contained 232 628, 2 288 887 and 42 826 127 reads, while the convert_project outputs contained 437 106, 2 392 084 and 43 632 082 reads respectively. Could these 'extra' reads be causing this? And where are these reads coming from if they aren't in the debris_list? Hoping that someone can give some possible explanations for this, because I really don't understand what's happening here.

Thanks for the data. Am I correct in guessing you used MIRA 3.4.x? Well, I'm quite sure of it actually.

Two things play a role in this misbehaviour you are seeing:
- MIRA 3.4.x does not recognise CASAVA 1.8 style Illumina paired-end data. One needs to rename reads before 3.4.x sees that they are paired end. You did not do that. This naming convention problem also applies to convert_project from the same package. - MIRA 3.4.x has a broken "check for doubly appearing reads" routine ... it does not recognise when two reads with the same name are given. This happens with CASAVA 1.8 style naming.

Possible fixes are simple, but you will need to redo assemblies:

1) if you want to continue with MIRA 3.4.x, you need to rename Illumina reads from CASAVA pipelines >= 1.8. That is, if a FASTQ definition is
     @somename 1:whatever
   you need to rename it to
     @somename/1 1:whatever
   and similarly for the other half of the pair ("2" instead of "1" above)

2) You switch to MIRA 3.9.x and the naming problems are automatically solved

Hope that helps,
  Bastien


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: