[mira_talk] Request for help: demystifying Illumina paired-end format identifiers ... /4 ???

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 05 Oct 2013 10:38:52 +0200

*sigh* Illumina, I hate you.

I'm currently investigating why MIRA seemed to completely fail to decently assemble a MiSeq Nextera lib in a recent publication ("Efficient and accurate whole genome assembly and methylome profiling of E. coli", http://www.biomedcentral.com/1471-2164/14/675/abstract).

Thankfully, the authors gave me two of their data sets and I am finding the following kind of read naming/comment in their files:

- in the first  file: @HWI-M01378:3:000000000-A2CB9:1:1101:17827:2093 1:N:0:
- in the second file: @HWI-M01378:3:000000000-A2CB9:1:1101:17827:2093 4:N:0:

The names and comments in the first file look OK, but what the hell is the "4" in the comment section of the second file??? So far I'd seen only "1" and "2" to determine the two reads of a pair. The Wikipedia entry on FASTQ also only knows 1 and 2, googling around I found the following document:

 
http://supportres.illumina.com/documents/myillumina/354c68ce-32f3-4ea4-9fe5-8cb2d968616c/casava1_8_changes.pdf

which helpfully states:

<read number> will typically be 1 or 2, but the field can support other values. (For example, certain indexing formats lead to 3 reads.)

Fine, so Illumina says there can be up to 3 reads (but are not saying how they name that). So why am I seeing a value of 4?

Could anyone enlighten me?

B.

PS: Of course MIRA looked only for 1 and 2, and not seeing any 2 it treated the data as unpaired. I've built in many sanity checks into MIRA so far, but checking whether there are no pairs in a set where pairs could be expected is not present so far. Guess what I'm going to program this week-end ... :-(


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: