I can track down the forward primers pretty easy. they are hard to miss considering 1/4 of the reads start with. tcagAAGCAGTGGTATCAACGCAGAGTACGGGGG I think that GGGGG at the end is the leading G problem, hypothetically due to linker chemistry. I have tried it with four G homopolymers five G homopolymers and six G homopolymers, and it seems there is a significant drop at the sixth. Is there any problem with feeding that into the SSAHA2? Also I found the reverse complement of this sequence, hence our chimeras. Is there potentially a B adapter floating around in the mix? I didn't obviously see another adapter in the sequence. Also FYI, the adapter was an exact match for many sequences in the TSA. Sincerely yours, Robin On 11/17/10, Robin Kramer <kodream@xxxxxxxxx> wrote: > It seems the sff's are available from the SRA, but only through the .SRA > file. > > Robin > > On 11/12/10, Gao, Guangtu <Guangtu.Gao@xxxxxxxxxxxx> wrote: >> Hi Robin, >> >> You might consider to check the adaptors and contaminants using SeqTrim. >> I also downloaded some EST sequences from NCBI for assembly before. I >> found that the adaptors are not totally cleaned from those sequences and >> they made chimeras. >> >> Guang >> >> -----Original Message----- >> From: mira_talk-bounce@xxxxxxxxxxxxx >> [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Robin Kramer >> Sent: Thursday, November 11, 2010 12:44 PM >> To: mira_talk@xxxxxxxxxxxxx >> Subject: [mira_talk] 454 cleaning >> >> Hi, >> >> I am doing an assembly from data publicly available at NCBI. >> >> The data are available here: >> >> http://www.ncbi.nlm.nih.gov/sra/SRX021565?report=full >> >> It is 454 data, but unfortunately neither the sff or xml files are >> available. >> >> I assembled the data with Mira, using the no xml flags. >> >> Which appeared to give a nice assembly. >> >> However when I BLASTN and BLASTX the first and forth contig their >> appear to be problems with the data. >> >> With the first contig, when I blastx it, it gives strong hits to two >> different genes on different ends, as if it were a chimera. When I >> blastn the sequence I get a strong hit on one side, then in the middle >> I get section with multiple hits to different species in the TSA. >> When I look at the pileup, there is a thin place in the gene an a huge >> drop off in coverage from one side. >> >> I think this appears to be due to a adapter trimming problem with the >> 454 data. >> >> The fourth contig when blastN has a very strong gene hit from a >> closely related species, but at the end has another small stretch that >> matches many other sequences in TSA that are distantly related. The >> adapter looking portion has small coverage with a giant change in >> coverage in the strong region. >> >> To me it appears as if some of the adapters are consistently not >> getting trimmed(in this set and in TSA). >> >> Here is a relevant thread in seqanswers. >> http://seqanswers.com/forums/showthread.php?t=3462 >> As well as a link out to previous discussions in this list. >> //www.freelists.org/post/mira_talk/454-adaptor-clipping >> >> Is there any consensus on recleaning the 454 adapters? I don't even >> know what the sequences would be to expect. >> >> The assemblies of the two contigs are pasted after this message. >> >> Sincerely yours, >> >> Robin >> >>>SRR054580_Asha_rep_c1 >> AGTTTCTTAACACTTGGACCAATATATTATTTTCCTTTGTTTTGCAAGAAGGATAAAAGA >> AAGAAACAAASGWMDAAAAGAGTTTACTAGAAAACTCATCGAGCTAGTTTCTCCACTTAT >> TTATTTTATGCTTTTCCCGCAAAAGTTTGGTGACTCATACATGAGGATAGATACACATAG >> ACTCACGTTATTTTACACACGTATATATATAAGGAAAGGCAGGCTAAGCCTTTGATTTAT >> TTGATTATTGATCCGCGCACTATTGGCAAAAAGACAGTAGTGGGGTAGCACAGCAGATGC >> AAAAAGATGAAGCATAGCTCTAAGCCACATATCTCATTTGAGAGTGACGAGGAGGTGGAA >> CGAGAAAGTTGAAAGGGTTGTTGTTCTTGATCTGCCTGGCCTGTTCGCTATGGAGGTTGA >> AAGTGTGTTGAATGACTTCCTCAGGCAATGCGTTTAACAAAGAGTTTCTACCTGCAAGTG >> TGCCAGTCACAGGTATATCATGGGTCTTGAATGCCACGTACTTGAAGTTGTTGCTCTGCG >> ATTTTGCAGCCACCGCAAAGTTTTGTGGCACGATCAGCACTTGTCCCTCTTGCAGCTCCC >> CATCAAACACTCTATCACCAGTGCAATTCACCACTTGCATCATCGCCCTCCCTTCCAATG >> CGTATACTATGCTGTTTGCGTTCAGGTTGTAGTGAGGCACGAACATGGCATTCTTGCGGA >> GAGATCCGAACTGAGCACTGAGTTTGAGGAGCAAGAGGGCTGGGAAGTCAAGGCCGGTGG >> CGGTTGTAATGCTACCAGCTTGAGGGTTGAAGAAGTCAGGCGATGAAGTTTGACCAATGT >> TGTGGCGAAGTCTCATTGTGCAAATGGTTTCATCAATGCCATTTCTGCTCTTGCTTTTGC >> TCTTCTGTGGCTTCTCTTCCTCTTCATCGTCGTCATCTTCTTCCTCTTCCTCTGCTCTCT >> GTTGCTGCTTTCTCGTTGGTGGAGCTGTCACGCTCAGACCTCCCTCCACTTTCACAATGG >> CTCCTTTCTCTTCGTCCTCGTTCACACCTTGGAGGTTTTTCACTATCTTCCTGTCCACGT >> TCAACGCTTGTTCCAAGAATTCTGGGGTGAAGCCACTGAATATGTTGCCGCCTTCATTAT >> CTTCTTCTTGTTCTTGATGTTGTTTTCCCTTCTGGCTTTGGCTTTGCTGATATTGTACGA >> ACTCTTGCTCTTGGTTCCCAGCAAGATAGAATCTCCTAGGCATCTGATCGAGCTGGTTCT >> GTAAGCTGTTGGTGTGAATAAGAGAAACTGCAACAACGGGAGTGTCTTGATTGTTGAACA >> TCCAGAAAGCAGCACCGGTAGGCACTGCGATCAAATCACCCTCTCTAAAGTGATACACCT >> TTTGGTGACGGTCTTGAGGCTTCTGGCTTTGTCCTCTTTGAGTTGGCTCTTCAAAAGTCT >> GAGGACAACCGGAGAAAATGATGCCAAAAATACCACTACCTTGTTGAATGAAGATTTGCT >> GGGGAGCGTTGGTGAAGAATGGTCTGCGGAGGCCATTGCGTTGGAGGGTGCAGCGAGAGA >> GGGCAACACCGGCACACTGGAAAGGCTTGCTGTTAGGGTTCCATGTCTCTATGAACCCAC >> CTTCCGACTCTATACGGTTATCGGGTTTGAGGGCATTCATGCGTTGGAGTTGGCACTCAT >> ATTCATTTTGCTGTGGCTGCTGTGTCTTATCTTTGCTAGCGAAGCACCCACTCAAAAGCA >> CAAGACAAAGGGAAAGAGATAGCGCAAGAAGCTTAGCCATGGATATGAATATGATTGATT >> TGTTTGTGGTGTCCCCCGTACTCTGCGTTGATACCACTGCTTAAGCAGTGGTATCAACGC >> AGAGTACGGGGGTGGACCCAATGACACCATTTTCATTTATTATTCGGATCATGGTGCTCC >> TGGTCTTGTCACCATGCCAGTAGGGGGAATATGTCATGGCCAACGATTTTGTGAATGTCT >> TGAAGAAGAAACATGATGCTAAATCCTACAAAAAGATGGTGATATACTTGGAAGCATGTG >> AATCTGGGAGCATGTTTGAAGGGATACTACCTAATAACATAAGCATATATGCGACCACAG >> CTTCCAACGCAGATGAGGATAGTTTTGCATATTATTGTCCTCATTCCTACCCTTCTCCTC >> CAACTGAGTACACCACTTGTTTGGGAGATGTGTACAGCATTTCGTGGTTAGAAGATAGTG >> ACAAAAATGACATGACAATAGAAACGCTGCAGCAACAATATGAAACCGTTCGCCGAAGAA >> CGTTAATTGGTAATGTCGACACCTCTTCTCATGTGAAACAATACGGAGATAGAAAATTCG >> AGAACGATACTCTTGCTACCTACATTGGTGCACCTGTTAAAACCAACCCCACCAACTCTG >> CAAATGCATATTCCTTTGAACCATATAGTCCTCAAACTAGACATGTTAGCCAACGAGATG >> CTCATTTACTCTACCTTAAGCTAGAGTTGCAAAAAGCCCCGGATGGTTCTATGGAAAAGT >> TGAAAGCTCAAATAGAGTTGGATGATGAAATTGCACATAGGAAGCATTTAGATAGTGTTT >> TCCATCTCATAGGGGATCTCTTGTTTGGAGAAGAGAATAATATCTCTACCATGTTGCTCC >> ATGTTCGTCCACCAGGCCAGCCTCTTGTCGATGATTGGGATTGTTTCAAGACCCTTATAA >> AAACTTACGAGAGCAATTGCGGTAAATTGTCAATCTATGGAAGGAAATACACAAGAGCCT >> TTGCTAACATGTGCAATGCTGGCATTTCTGAGGAGCAAATGGTAGTAGCCTCTTCACAAG >> CTTGTCCCAAGGAAAATCCTTCTTAAATTAATTCGTTAAGTTGATAATGTAATAACCAAT >> ATATATCATGAAAGATTAAAAATTGTGCTTTCATTCTACAAAATGGATTATAATCCTTTG >> >>>SRR054580_Asha_rep_c4 >> TCTCCGACTCAGAAGCAGTGGTATCAACGCAGAGTCTTGGGGAACTGGAATTGACGATCA >> AGTTGGTCACACCTGTTGCTCCAGCAACATAGTGCAGAAATTGCATGTGTCCAATGTGTA >> GATCTCTAACAAGATCATAATTATAACATTCTATGTGTAGTTGACTCTTGCTTTTGATTA >> ACTCCTGCATAGATGTTTCTACCAAAAATGAAAAAAAAAATCATTAATAGATGCATATTG >> CAGCTAAATTTAGCAGTGAGTTGGTGATACCTCATCCCCCAGTTAGATAAAAGCCACTAG >> AAGCTGCATTTTCAAATCAACAAGTAGTGATTTATGGCTTCTTTGGGTTTTATGGTGTGT >> TTTGTAGAAAATTTGTCCTTCATTTTAGCTATGAGCATTCATTGGGTATTGCATAAGTTT >> TGATGCTATTGTATTGATTTTGATATAAGAAAAGAAAAGTTGTAATGCGTTTGTTTCAAT >> TATTTTTTTTTAAAGAAATGATATTTTTAACTTGTGGAGAGTTTTAAGAGATTTAGATAA >> CTTGTAAGGTAACAGATTGTAGAAGTATAAATTACTCTGCCATAAATGAAGCTTTAAGTG >> CACTACAAGTAAACAACT >> >> -- >> You have received this mail because you are subscribed to the mira_talk >> mailing list. For information on how to subscribe or unsubscribe, please >> visit http://www.chevreux.org/mira_mailinglists.html >> >> >> -- >> You have received this mail because you are subscribed to the mira_talk >> mailing list. For information on how to subscribe or unsubscribe, please >> visit http://www.chevreux.org/mira_mailinglists.html >> > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html