[mira_talk] Re: MIRA: I am doing it wrong.

  • From: Alessandro Riccombeni <rikkomba@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 9 Dec 2009 16:19:45 +0000

Sorry for being dumb. Yes, I just checked the Titanium adaptor and its
reverse complement and they are both present in my reads.
 TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG (marked QQQ below)
CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA (marked KKK below)
I just found out that taking my sff files and doing
sff_extract -c
turns one of my 3 sff files's fasta from 3789 bytes to 3787 bytes.
Meanwhile, the Titanium sequences are still present!

Below I am adding examples of three sequences in 3 cases: after a
sff_extract, after manual replacement of titanium vectors with QQQ and KKK
and after sff_extract -c (for the clipping).
It seems that sff_extract -c didn't touch the titanium sequences. Why?

I read that in a read I could find a linker joining the paired ends: are the
2 titanium adaptors such linker, or could there be another sequence to
remove? Does it make sense to remove from everywhere the 2 titanium
sequences and then, without splitting the reads (IF their are paired and
joined) used them again with MIRA, directly?

>norm_1
GAATACACAGAAGACTAATTTAATAATACC-CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA-TTTCCTTCAGTCGACTAAA
>manual_1
GAATACACAGAAGACTAATTTAATAATACC------------------KKK-----------------------TTTCCTTCAGTCGACTAAA
>sff-c_1
GAATACACAGAAGACTAATTTAATAATACC-CGTAATAACTTCGTATAGCATACATTATACGAAGTTATACGA-TTTCCTTCAGTCGACTAAA
>norm_2
TTTGTGTTTCTCAACTTCTTCTTAATCTCGGTTGATTCATCCGGTGCTATTCCAGTTGGTACAATGGTTGCTATTGTTGTCATTTGGTTCGTCATTTCTATTCCATTATCCGTTGTTGGATCTATCAT-TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG-TCCAGCTCGTTTACAGAGAGTCCCATTAC
>manual_2
TTTGTGTTTCTCAACTTCTTCTTAATCTCGGTTGATTCATCCGGTGCTATTCCAGTTGGTACAATGGTTGCTATTGTTGTCATTTGGTTCGTCATTTCTATTCCATTATCCGTTGTTGGATCTATCAT-------------------QQQ----------------------TCCAGCTCGTTTACAGAGAGTCCCATTAC
>sff-c_2
TTTGTGTTTCTCAACTTCTTCTTAATCTCGGTTGATTCATCCGGTGCTATTCCAGTTGGTACAATGGTTGCTATTGTTGTCATTTGGTTCGTCATTTCTATTCCATTATCCGTTGTTGGATCTATCAT-TCGTATAACTTCGTATAATGTATGCTATACGAAGTTATTACG
TCCAGCTCGTTTACAGAGAGTCCCATTAC
>norm_3
CAACGCTGGCTGCAGGGAGAAGGATGTCCAATTTCTCAAGGATGAATTGAAGCGTTCAATAATGTACAGCACAATACATTTGAAGGAACTCT-TCGTATAACTTCGTATAATGTAT-GCTATACGAAGTTATTACGGCTATTCAATAGAACATAGTACCAAGACGAGACCTAAGTAACTGGTACAGGTTGTGACTGTTGAACTGTT
>manual_3
CAACGCTGGCTGCAGGGAGAAGGATGTCCAATTTCTCAAGGATGAATTGAAGCGTTCAATAATGTACAGCACAATACATTTGAAGGAACTCT--------QQQ--------------GCTATTACGAAGTTATTACGGCTATCAATAGAACATAGTACCAAGACGAGACCTAAGTAACTGGTACAGGTTGTGACTGTTGAACTGTT
>sff-c_3
CAACGCTGGCTGCAGGGAGAAGGATGTCCAATTTCTCAAGGATGAATTGAAGCGTTCAATAATGTACAGCACAATACATTTGAAGGAACTCT-TCGTATAACTTCGTATAATGTAT-GCTATACGAAGTTATTACGGCTATTCAATAGAACATAGTACCAAGACGAGACCTAAGTAACTGGTACAGGTTGTGACTGTTGAACTGTT



On Fri, Dec 4, 2009 at 8:10 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote:

> On Freitag 04 Dezember 2009 Alessandro Riccombeni wrote:
> > The sequencing service provided a preassembled dataset as well: 39
> >  scaffolds and 933 contigs.
>
> Ummm ... 933 contigs in Newbler versus 11000 in MIRA. Something is wrong
> there.
>
> > Some info: 13 Mbs, 2.38% are Ns, GC in the sequence is 36%, largest
> >  scaffold is 1.9 Mb and the smallest is 3Kb.
> > After my first MIRA run with the 454 only (as they shouldn't have used
> any
> > non-454 reads) I was quite clueless about which strategy did they use to
> >  get 39 scaffolds where I got 11000 contigs. As I wrote, they didn't use
> >  any custom adaptor, so I don't know what I should do as preprocessing
> >  goes...
>
> Titanium paired-end ... did you use the two Titanium linker sequences in
> sff_extract?
>
> Regards,
>  Bastien
>
> --
> You have received this mail because you are subscribed to the mira_talk
> mailing list. For information on how to subscribe or unsubscribe, please
> visit http://www.chevreux.org/mira_mailinglists.html
>

Other related posts: