[mira_talk] Re: duplicate read names not allowed ?

  • From: Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 19 Sep 2012 11:14:26 +0200

Le 19/09/2012 01:55, Bastien Chevreux a écrit :
On Sep 18, 2012, at 12:06 , Laurent MANCHON wrote:
yes, it works now,
there are so many standard fastq, so I thought Mira was able to recognize / 2 or / 1 with or without a space before. but it's strange, i have 36M reads in each file and only subsets of them are loaded by Mira:
[...]
Loaded 2241 reads, Localtime: Tue Sep 18 12:00:33 2012

I am inclined to say that your FASTQ is not what you expect, it may be broken in some subtle way. Have a look the area around the 2241-th read and hunt for things which look fishy a bit before and a bit after. I'm using Heng Lis code for FASTQ parsing, which so far has probably done well with billions of reads in tens of thousands of files from hundreds of people.

B.

right, i have an incorrect length of line in 2243-th read:
+2241
@@CFFFFFHDHHHJIIGHEIIIJIDHGDHAGGICCHGHIIJJHG>B<;@>CCC@=CBD(:A:ACCCCC@CCCCCDCCDEDD8<@CCDDD<@DDB@CBBDB
@D61655M1:276:D10YJACXX:8:1101:11117:2032/1
TGTAACTTCAGATTTTGTGGAATGAATCCTTAAAAACCCTTCATTGCTTATTCAGCTTCAGAAATTTCAAAGGCACATTCGGAGACGAAGTTTATTCGTT
+2242
CC@FFFFFHHHHHJIGIIJJJFIIGGJHIJIIJJJJJJJIIJJJJJJJJJJIJJIJIJJJJJJIJIIJJJIJJIIEHHHHHHGFFDD>BB@CDEDDEDDD
@D61655M1:276:D10YJACXX:8:1101:11049:2038/1
CAAGGCTTTGATGCATCGTCAGCGATGTAACCCTCGTCCTGATGCTTAGTCCACGACGGTAGGCAAATCACCCGAGTAGGGGTGCGAAAAGAGCCGGTGA
+2243
@<?DDDD>CF?B>9CFGG?CEAHEEHEECGECFFCHHII=9?BG@F4=C8BBGIGHIB/,6=?@ABB>@>>??=88<9::2509/1@@555<?@C9@@B&9?
@D61655M1:276:D10YJACXX:8:1101:11065:2038/1
CGTCATCAATGAAGAGGGCCACTTGCCAGGGGTAGGAGTGAGGGGTGGCCTCATTACCTCCAACGATGTGGGTCTCTGTCTTTGTGGGGCCACAGCCAGC

Heng Lis, ah yes i know that name, i've used one of his tools named XAT, a cross-species alignment tool, a good and efficient tool.
And of course BWA.

Laurent --

Other related posts: