Le 17/09/2012 20:41, Bastien Chevreux a écrit :
On Sep 17, 2012, at 15:55 , Laurent MANCHON wrote:[...] in log file i have: MIRA found duplicate read names in your data, This should never, never be!but in manuel we can read: "the names of the reads can be either the very same in both files or already have a /1 or /2 appended."The full sentence in the manual is:"Depending on the preprocessing pipeline of your sequencing provider, the names of the reads can ..."I do concede that the sentence is ambiguous, I will change that to something like:The FASTQ naming must follow one of the known Illumina schemes: either have /1 and /2 appended to the read names (old Illumina pipeline) or have the read names without /1 and /2, but hen supplemented with the standard Illumina comment field which contains pair information. See also: http://en.wikipedia.org/wiki/FASTQ_format#Illumina_sequence_identifiersI suspect your files do not adhere to the above :-)
you are right, this is what i have: head GAMMA-1.solexa.fastq @D61655M1:276:D10YJACXX:8:1101:1456:1955 NCTGNAGTGTNNGATNCGGGGTTNNNNNNNNNNACNTTNNNNNNNNNNGNNNGNNNTTGAANNNNNNNNTNACNGTNTNATNAANNNNTTCGGACNACAT + #0;@#2@=@?##32@#2=>@@<@############################################################################# @D61655M1:276:D10YJACXX:8:1101:1494:1967 NTTTNGGCTCTGTACCTTTTGTATCAGGGGAACCTAAAAGTGTAAAAAGTATACAATCGAGTGGGTTAGATCTTTCTAAGACTAGTTTAAAGTTAAATTG + #0;@#28=????9@?<=?<>;3@?==?><@???<?>????6><9=?<>;0=?=??;???;?);/=36>>=9>?====??>><>?<>???=>?7::====: @D61655M1:276:D10YJACXX:8:1101:1551:1953 NGAGNACATGAGACGGACGTTGTAGAGCATTCGAACTTTGCCAGCAAGCATACTCCACCAGTTTTTTTGAGCTAGAGTGATAATGGTCAGATCGGAAGAG head GAMMA-2.solexa.fastq @D61655M1:276:D10YJACXX:8:1101:1456:1955 CACCAATCCAAGCTCCACAACTTGATGTAGTCCGAACAGTTTCATCATACGGTCAAAATCAAATTCAACGTCCATCATTTGACTTAAACGTTCTTCCCTC + C@CFFFFFHHGHHJJJIJJCHIJJIJJHIIHIIIIJJEIHIIIIJGIIJJIIHIHHJJJIJIJJJIJIHHEEFFFEEEE@CDEDDDDDCD<C?@CDCCDD @D61655M1:276:D10YJACXX:8:1101:1494:1967 TAAAAATCCTTTAAAACAATATGCTAACTTAGCTAATTTAAAGCCTAGTTTTTGTAAAAAACAACCCGTGTAGCACTGCAATAACTTAAACTTTTGCGTA + ?@BFFFFFDDHGFGGIGGIIEHJGGGIGGIHGIJJIJJJIIJGGGGEHHGGHIJGJJJJIJJIGIGIHHGEFFFFEEEEEEDDDDDDDCCCCDDDCDB>@ @D61655M1:276:D10YJACXX:8:1101:1551:1953 GACCATTATCACTCTAGCTCAAAAAAACTGGTGGAGTATGCTTGCTGGCAAAGTTCGAATGCTCTACAACGTCCGTCTCATGTTCTCCAGATCGGAAGAG it seems to be a Casava 1.8 format but pair member is missing
B.