[mira_talk] Re: duplicate read names not allowed ?

  • From: Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Mon, 17 Sep 2012 20:57:11 +0200

Le 17/09/2012 20:41, Bastien Chevreux a écrit :
On Sep 17, 2012, at 15:55 , Laurent MANCHON wrote:
[...]
in log file i have: MIRA found duplicate read names in your data,
This should never, never be!

but in manuel we can read: "the names of the reads can be either the very same in both files or already have a /1 or /2 appended."

The full sentence in the manual is:
"Depending on the preprocessing pipeline of your sequencing provider, the names of the reads can ..."

I do concede that the sentence is ambiguous, I will change that to something like:

The FASTQ naming must follow one of the known Illumina schemes: either have /1 and /2 appended to the read names (old Illumina pipeline) or have the read names without /1 and /2, but hen supplemented with the standard Illumina comment field which contains pair information. See also: http://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiers


I suspect your files do not adhere to the above :-)

you are right, this is what i have:

head GAMMA-1.solexa.fastq
@D61655M1:276:D10YJACXX:8:1101:1456:1955
NCTGNAGTGTNNGATNCGGGGTTNNNNNNNNNNACNTTNNNNNNNNNNGNNNGNNNTTGAANNNNNNNNTNACNGTNTNATNAANNNNTTCGGACNACAT
+
#0;@#2@=@?##32@#2=>@@<@#############################################################################
@D61655M1:276:D10YJACXX:8:1101:1494:1967
NTTTNGGCTCTGTACCTTTTGTATCAGGGGAACCTAAAAGTGTAAAAAGTATACAATCGAGTGGGTTAGATCTTTCTAAGACTAGTTTAAAGTTAAATTG
+
#0;@#28=????9@?<=?<>;3@?==?><@???<?>????6><9=?<>;0=?=??;???;?);/=36>>=9>?====??>><>?<>???=>?7::====:
@D61655M1:276:D10YJACXX:8:1101:1551:1953
NGAGNACATGAGACGGACGTTGTAGAGCATTCGAACTTTGCCAGCAAGCATACTCCACCAGTTTTTTTGAGCTAGAGTGATAATGGTCAGATCGGAAGAG

head GAMMA-2.solexa.fastq
@D61655M1:276:D10YJACXX:8:1101:1456:1955
CACCAATCCAAGCTCCACAACTTGATGTAGTCCGAACAGTTTCATCATACGGTCAAAATCAAATTCAACGTCCATCATTTGACTTAAACGTTCTTCCCTC
+
C@CFFFFFHHGHHJJJIJJCHIJJIJJHIIHIIIIJJEIHIIIIJGIIJJIIHIHHJJJIJIJJJIJIHHEEFFFEEEE@CDEDDDDDCD<C?@CDCCDD
@D61655M1:276:D10YJACXX:8:1101:1494:1967
TAAAAATCCTTTAAAACAATATGCTAACTTAGCTAATTTAAAGCCTAGTTTTTGTAAAAAACAACCCGTGTAGCACTGCAATAACTTAAACTTTTGCGTA
+
?@BFFFFFDDHGFGGIGGIIEHJGGGIGGIHGIJJIJJJIIJGGGGEHHGGHIJGJJJJIJJIGIGIHHGEFFFFEEEEEEDDDDDDDCCCCDDDCDB>@
@D61655M1:276:D10YJACXX:8:1101:1551:1953
GACCATTATCACTCTAGCTCAAAAAAACTGGTGGAGTATGCTTGCTGGCAAAGTTCGAATGCTCTACAACGTCCGTCTCATGTTCTCCAGATCGGAAGAG

it seems to be a Casava 1.8 format but pair member is missing



B.



Other related posts: