[mira_talk] Re: 454 cleaning

  • From: Robin Kramer <kodream@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Thu, 18 Nov 2010 13:42:01 -0700

So after I do the assembly with the SSAHA2 mappings.

When I grep through the file I still find thousands of contigs
containing the vector.

Would it be more appropriate totally clip the vectors first, so that
mira doesn't have to worry about these things?

Sincerely yours,

Robin

On 11/18/10, Robin Kramer <kodream@xxxxxxxxx> wrote:
> I can track down the forward primers pretty easy.
>
> they are hard to miss considering 1/4 of the reads start with.
>
> tcagAAGCAGTGGTATCAACGCAGAGTACGGGGG
>
> I think that GGGGG at the end is the leading G problem, hypothetically
> due to linker chemistry.
>
> I have tried it with four G homopolymers five G homopolymers and six G
> homopolymers, and it seems there is a significant drop at the sixth.
>
> Is there any problem with feeding that into the SSAHA2?
>
> Also I found the reverse complement of this sequence, hence our chimeras.
>
> Is there potentially a B adapter floating around in the mix?  I didn't
> obviously see another adapter in the sequence.
>
> Also FYI, the adapter was an exact match for many sequences in the TSA.
>
> Sincerely yours,
>
> Robin
>
> On 11/17/10, Robin Kramer <kodream@xxxxxxxxx> wrote:
>> It seems the sff's are available from the SRA, but only through the .SRA
>> file.
>>
>> Robin
>>
>> On 11/12/10, Gao, Guangtu <Guangtu.Gao@xxxxxxxxxxxx> wrote:
>>> Hi Robin,
>>>
>>> You might consider to check the adaptors and contaminants using SeqTrim.
>>> I also downloaded some EST sequences from NCBI for assembly before. I
>>> found that the adaptors are not totally cleaned from those sequences and
>>> they made chimeras.
>>>
>>> Guang
>>>
>>> -----Original Message-----
>>> From: mira_talk-bounce@xxxxxxxxxxxxx
>>> [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Robin Kramer
>>> Sent: Thursday, November 11, 2010 12:44 PM
>>> To: mira_talk@xxxxxxxxxxxxx
>>> Subject: [mira_talk] 454 cleaning
>>>
>>> Hi,
>>>
>>> I am doing an assembly from data publicly available at NCBI.
>>>
>>> The data are available here:
>>>
>>> http://www.ncbi.nlm.nih.gov/sra/SRX021565?report=full
>>>
>>> It is 454 data, but unfortunately neither the sff or xml files are
>>> available.
>>>
>>> I assembled the data with Mira, using the no xml flags.
>>>
>>> Which appeared to give a nice assembly.
>>>
>>> However when I BLASTN and BLASTX the first and forth contig their
>>> appear to be problems with the data.
>>>
>>> With the first contig, when I blastx it, it gives strong hits to two
>>> different genes on different ends, as if it were a chimera.  When I
>>> blastn the sequence I get a strong hit on one side, then in the middle
>>> I get section with multiple hits to different species in the TSA.
>>> When I look at the pileup, there is a thin place in the gene an a huge
>>> drop off in coverage from one side.
>>>
>>> I think this appears to be due to a adapter trimming problem with the
>>> 454 data.
>>>
>>> The fourth contig when blastN has a very strong gene hit from a
>>> closely related species, but at the end has another small stretch that
>>> matches many other sequences in TSA that are distantly related.  The
>>> adapter looking portion has small coverage with a giant change in
>>> coverage in the strong region.
>>>
>>> To me it appears as if some of the adapters are consistently not
>>> getting trimmed(in this set and in TSA).
>>>
>>> Here is a relevant thread in seqanswers.
>>> http://seqanswers.com/forums/showthread.php?t=3462
>>> As well as a link out to previous discussions in this list.
>>> //www.freelists.org/post/mira_talk/454-adaptor-clipping
>>>
>>> Is there any consensus on recleaning the 454 adapters?  I don't even
>>> know what the sequences would be to expect.
>>>
>>> The assemblies of the two contigs are pasted after this message.
>>>
>>> Sincerely yours,
>>>
>>> Robin
>>>
>>>>SRR054580_Asha_rep_c1
>>> AGTTTCTTAACACTTGGACCAATATATTATTTTCCTTTGTTTTGCAAGAAGGATAAAAGA
>>> AAGAAACAAASGWMDAAAAGAGTTTACTAGAAAACTCATCGAGCTAGTTTCTCCACTTAT
>>> TTATTTTATGCTTTTCCCGCAAAAGTTTGGTGACTCATACATGAGGATAGATACACATAG
>>> ACTCACGTTATTTTACACACGTATATATATAAGGAAAGGCAGGCTAAGCCTTTGATTTAT
>>> TTGATTATTGATCCGCGCACTATTGGCAAAAAGACAGTAGTGGGGTAGCACAGCAGATGC
>>> AAAAAGATGAAGCATAGCTCTAAGCCACATATCTCATTTGAGAGTGACGAGGAGGTGGAA
>>> CGAGAAAGTTGAAAGGGTTGTTGTTCTTGATCTGCCTGGCCTGTTCGCTATGGAGGTTGA
>>> AAGTGTGTTGAATGACTTCCTCAGGCAATGCGTTTAACAAAGAGTTTCTACCTGCAAGTG
>>> TGCCAGTCACAGGTATATCATGGGTCTTGAATGCCACGTACTTGAAGTTGTTGCTCTGCG
>>> ATTTTGCAGCCACCGCAAAGTTTTGTGGCACGATCAGCACTTGTCCCTCTTGCAGCTCCC
>>> CATCAAACACTCTATCACCAGTGCAATTCACCACTTGCATCATCGCCCTCCCTTCCAATG
>>> CGTATACTATGCTGTTTGCGTTCAGGTTGTAGTGAGGCACGAACATGGCATTCTTGCGGA
>>> GAGATCCGAACTGAGCACTGAGTTTGAGGAGCAAGAGGGCTGGGAAGTCAAGGCCGGTGG
>>> CGGTTGTAATGCTACCAGCTTGAGGGTTGAAGAAGTCAGGCGATGAAGTTTGACCAATGT
>>> TGTGGCGAAGTCTCATTGTGCAAATGGTTTCATCAATGCCATTTCTGCTCTTGCTTTTGC
>>> TCTTCTGTGGCTTCTCTTCCTCTTCATCGTCGTCATCTTCTTCCTCTTCCTCTGCTCTCT
>>> GTTGCTGCTTTCTCGTTGGTGGAGCTGTCACGCTCAGACCTCCCTCCACTTTCACAATGG
>>> CTCCTTTCTCTTCGTCCTCGTTCACACCTTGGAGGTTTTTCACTATCTTCCTGTCCACGT
>>> TCAACGCTTGTTCCAAGAATTCTGGGGTGAAGCCACTGAATATGTTGCCGCCTTCATTAT
>>> CTTCTTCTTGTTCTTGATGTTGTTTTCCCTTCTGGCTTTGGCTTTGCTGATATTGTACGA
>>> ACTCTTGCTCTTGGTTCCCAGCAAGATAGAATCTCCTAGGCATCTGATCGAGCTGGTTCT
>>> GTAAGCTGTTGGTGTGAATAAGAGAAACTGCAACAACGGGAGTGTCTTGATTGTTGAACA
>>> TCCAGAAAGCAGCACCGGTAGGCACTGCGATCAAATCACCCTCTCTAAAGTGATACACCT
>>> TTTGGTGACGGTCTTGAGGCTTCTGGCTTTGTCCTCTTTGAGTTGGCTCTTCAAAAGTCT
>>> GAGGACAACCGGAGAAAATGATGCCAAAAATACCACTACCTTGTTGAATGAAGATTTGCT
>>> GGGGAGCGTTGGTGAAGAATGGTCTGCGGAGGCCATTGCGTTGGAGGGTGCAGCGAGAGA
>>> GGGCAACACCGGCACACTGGAAAGGCTTGCTGTTAGGGTTCCATGTCTCTATGAACCCAC
>>> CTTCCGACTCTATACGGTTATCGGGTTTGAGGGCATTCATGCGTTGGAGTTGGCACTCAT
>>> ATTCATTTTGCTGTGGCTGCTGTGTCTTATCTTTGCTAGCGAAGCACCCACTCAAAAGCA
>>> CAAGACAAAGGGAAAGAGATAGCGCAAGAAGCTTAGCCATGGATATGAATATGATTGATT
>>> TGTTTGTGGTGTCCCCCGTACTCTGCGTTGATACCACTGCTTAAGCAGTGGTATCAACGC
>>> AGAGTACGGGGGTGGACCCAATGACACCATTTTCATTTATTATTCGGATCATGGTGCTCC
>>> TGGTCTTGTCACCATGCCAGTAGGGGGAATATGTCATGGCCAACGATTTTGTGAATGTCT
>>> TGAAGAAGAAACATGATGCTAAATCCTACAAAAAGATGGTGATATACTTGGAAGCATGTG
>>> AATCTGGGAGCATGTTTGAAGGGATACTACCTAATAACATAAGCATATATGCGACCACAG
>>> CTTCCAACGCAGATGAGGATAGTTTTGCATATTATTGTCCTCATTCCTACCCTTCTCCTC
>>> CAACTGAGTACACCACTTGTTTGGGAGATGTGTACAGCATTTCGTGGTTAGAAGATAGTG
>>> ACAAAAATGACATGACAATAGAAACGCTGCAGCAACAATATGAAACCGTTCGCCGAAGAA
>>> CGTTAATTGGTAATGTCGACACCTCTTCTCATGTGAAACAATACGGAGATAGAAAATTCG
>>> AGAACGATACTCTTGCTACCTACATTGGTGCACCTGTTAAAACCAACCCCACCAACTCTG
>>> CAAATGCATATTCCTTTGAACCATATAGTCCTCAAACTAGACATGTTAGCCAACGAGATG
>>> CTCATTTACTCTACCTTAAGCTAGAGTTGCAAAAAGCCCCGGATGGTTCTATGGAAAAGT
>>> TGAAAGCTCAAATAGAGTTGGATGATGAAATTGCACATAGGAAGCATTTAGATAGTGTTT
>>> TCCATCTCATAGGGGATCTCTTGTTTGGAGAAGAGAATAATATCTCTACCATGTTGCTCC
>>> ATGTTCGTCCACCAGGCCAGCCTCTTGTCGATGATTGGGATTGTTTCAAGACCCTTATAA
>>> AAACTTACGAGAGCAATTGCGGTAAATTGTCAATCTATGGAAGGAAATACACAAGAGCCT
>>> TTGCTAACATGTGCAATGCTGGCATTTCTGAGGAGCAAATGGTAGTAGCCTCTTCACAAG
>>> CTTGTCCCAAGGAAAATCCTTCTTAAATTAATTCGTTAAGTTGATAATGTAATAACCAAT
>>> ATATATCATGAAAGATTAAAAATTGTGCTTTCATTCTACAAAATGGATTATAATCCTTTG
>>>
>>>>SRR054580_Asha_rep_c4
>>> TCTCCGACTCAGAAGCAGTGGTATCAACGCAGAGTCTTGGGGAACTGGAATTGACGATCA
>>> AGTTGGTCACACCTGTTGCTCCAGCAACATAGTGCAGAAATTGCATGTGTCCAATGTGTA
>>> GATCTCTAACAAGATCATAATTATAACATTCTATGTGTAGTTGACTCTTGCTTTTGATTA
>>> ACTCCTGCATAGATGTTTCTACCAAAAATGAAAAAAAAAATCATTAATAGATGCATATTG
>>> CAGCTAAATTTAGCAGTGAGTTGGTGATACCTCATCCCCCAGTTAGATAAAAGCCACTAG
>>> AAGCTGCATTTTCAAATCAACAAGTAGTGATTTATGGCTTCTTTGGGTTTTATGGTGTGT
>>> TTTGTAGAAAATTTGTCCTTCATTTTAGCTATGAGCATTCATTGGGTATTGCATAAGTTT
>>> TGATGCTATTGTATTGATTTTGATATAAGAAAAGAAAAGTTGTAATGCGTTTGTTTCAAT
>>> TATTTTTTTTTAAAGAAATGATATTTTTAACTTGTGGAGAGTTTTAAGAGATTTAGATAA
>>> CTTGTAAGGTAACAGATTGTAGAAGTATAAATTACTCTGCCATAAATGAAGCTTTAAGTG
>>> CACTACAAGTAAACAACT
>>>
>>> --
>>> You have received this mail because you are subscribed to the mira_talk
>>> mailing list. For information on how to subscribe or unsubscribe, please
>>> visit http://www.chevreux.org/mira_mailinglists.html
>>>
>>>
>>> --
>>> You have received this mail because you are subscribed to the mira_talk
>>> mailing list. For information on how to subscribe or unsubscribe, please
>>> visit http://www.chevreux.org/mira_mailinglists.html
>>>
>>
>

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: