Sorry for being dumb. Yes, I just checked the Titanium adaptor and its reverse complement and they are both present in my reads. TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG (marked QQQ below) CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA (marked KKK below) I just found out that taking my sff files and doing sff_extract -c turns one of my 3 sff files's fasta from 3789 bytes to 3787 bytes. Meanwhile, the Titanium sequences are still present! Below I am adding examples of three sequences in 3 cases: after a sff_extract, after manual replacement of titanium vectors with QQQ and KKK and after sff_extract -c (for the clipping). It seems that sff_extract -c didn't touch the titanium sequences. Why? I read that in a read I could find a linker joining the paired ends: are the 2 titanium adaptors such linker, or could there be another sequence to remove? Does it make sense to remove from everywhere the 2 titanium sequences and then, without splitting the reads (IF their are paired and joined) used them again with MIRA, directly? >norm_1 GAATACACAGAAGACTAATTTAATAATACC-CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA-TTTCCTTCAGTCGACTAAA >manual_1 GAATACACAGAAGACTAATTTAATAATACC------------------KKK-----------------------TTTCCTTCAGTCGACTAAA >sff-c_1 GAATACACAGAAGACTAATTTAATAATACC-CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA-TTTCCTTCAGTCGACTAAA >norm_2 TTTGTGTTTCTCAACTTCTTCTTAATCTCGGTTGATTCATCCGGTGCTATTCCAGTTGGTACAATGGTTGCTATTGTTGTCATTTGGTTCGTCATTTCTATTCCATTATCCGTTGTTGGATCTATCAT-TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG-TCCAGCTCGTTTACAGAGAGTCCCATTAC >manual_2 TTTGTGTTTCTCAACTTCTTCTTAATCTCGGTTGATTCATCCGGTGCTATTCCAGTTGGTACAATGGTTGCTATTGTTGTCATTTGGTTCGTCATTTCTATTCCATTATCCGTTGTTGGATCTATCAT-------------------QQQ----------------------TCCAGCTCGTTTACAGAGAGTCCCATTAC >sff-c_2 TTTGTGTTTCTCAACTTCTTCTTAATCTCGGTTGATTCATCCGGTGCTATTCCAGTTGGTACAATGGTTGCTATTGTTGTCATTTGGTTCGTCATTTCTATTCCATTATCCGTTGTTGGATCTATCAT-TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG TCCAGCTCGTTTACAGAGAGTCCCATTAC >norm_3 CAACGCTGGCTGCAGGGAGAAGGATGTCCAATTTCTCAAGGATGAATTGAAGCGTTCAATAATGTACAGCACAATACATTTGAAGGAACTCT-TCGTATAACTTCGTATAATGTAT-GCTATACGAAGTTATTACGGCTATTCAATAGAACATAGTACCAAGACGAGACCTAAGTAACTGGTACAGGTTGTGACTGTTGAACTGTT >manual_3 CAACGCTGGCTGCAGGGAGAAGGATGTCCAATTTCTCAAGGATGAATTGAAGCGTTCAATAATGTACAGCACAATACATTTGAAGGAACTCT--------QQQ--------------GCTATTACGAAGTTATTACGGCTATCAATAGAACATAGTACCAAGACGAGACCTAAGTAACTGGTACAGGTTGTGACTGTTGAACTGTT >sff-c_3 CAACGCTGGCTGCAGGGAGAAGGATGTCCAATTTCTCAAGGATGAATTGAAGCGTTCAATAATGTACAGCACAATACATTTGAAGGAACTCT-TCGTATAACTTCGTATAATGTAT-GCTATACGAAGTTATTACGGCTATTCAATAGAACATAGTACCAAGACGAGACCTAAGTAACTGGTACAGGTTGTGACTGTTGAACTGTT On Fri, Dec 4, 2009 at 8:10 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Freitag 04 Dezember 2009 Alessandro Riccombeni wrote: > > The sequencing service provided a preassembled dataset as well: 39 > > scaffolds and 933 contigs. > > Ummm ... 933 contigs in Newbler versus 11000 in MIRA. Something is wrong > there. > > > Some info: 13 Mbs, 2.38% are Ns, GC in the sequence is 36%, largest > > scaffold is 1.9 Mb and the smallest is 3Kb. > > After my first MIRA run with the 454 only (as they shouldn't have used > any > > non-454 reads) I was quite clueless about which strategy did they use to > > get 39 scaffolds where I got 11000 contigs. As I wrote, they didn't use > > any custom adaptor, so I don't know what I should do as preprocessing > > goes... > > Titanium paired-end ... did you use the two Titanium linker sequences in > sff_extract? > > Regards, > Bastien > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html >