Hi, I am doing an assembly from data publicly available at NCBI. The data are available here: http://www.ncbi.nlm.nih.gov/sra/SRX021565?report=full It is 454 data, but unfortunately neither the sff or xml files are available. I assembled the data with Mira, using the no xml flags. Which appeared to give a nice assembly. However when I BLASTN and BLASTX the first and forth contig their appear to be problems with the data. With the first contig, when I blastx it, it gives strong hits to two different genes on different ends, as if it were a chimera. When I blastn the sequence I get a strong hit on one side, then in the middle I get section with multiple hits to different species in the TSA. When I look at the pileup, there is a thin place in the gene an a huge drop off in coverage from one side. I think this appears to be due to a adapter trimming problem with the 454 data. The fourth contig when blastN has a very strong gene hit from a closely related species, but at the end has another small stretch that matches many other sequences in TSA that are distantly related. The adapter looking portion has small coverage with a giant change in coverage in the strong region. To me it appears as if some of the adapters are consistently not getting trimmed(in this set and in TSA). Here is a relevant thread in seqanswers. http://seqanswers.com/forums/showthread.php?t=3462 As well as a link out to previous discussions in this list. //www.freelists.org/post/mira_talk/454-adaptor-clipping Is there any consensus on recleaning the 454 adapters? I don't even know what the sequences would be to expect. The assemblies of the two contigs are pasted after this message. Sincerely yours, Robin >SRR054580_Asha_rep_c1 AGTTTCTTAACACTTGGACCAATATATTATTTTCCTTTGTTTTGCAAGAAGGATAAAAGA AAGAAACAAASGWMDAAAAGAGTTTACTAGAAAACTCATCGAGCTAGTTTCTCCACTTAT TTATTTTATGCTTTTCCCGCAAAAGTTTGGTGACTCATACATGAGGATAGATACACATAG ACTCACGTTATTTTACACACGTATATATATAAGGAAAGGCAGGCTAAGCCTTTGATTTAT TTGATTATTGATCCGCGCACTATTGGCAAAAAGACAGTAGTGGGGTAGCACAGCAGATGC AAAAAGATGAAGCATAGCTCTAAGCCACATATCTCATTTGAGAGTGACGAGGAGGTGGAA CGAGAAAGTTGAAAGGGTTGTTGTTCTTGATCTGCCTGGCCTGTTCGCTATGGAGGTTGA AAGTGTGTTGAATGACTTCCTCAGGCAATGCGTTTAACAAAGAGTTTCTACCTGCAAGTG TGCCAGTCACAGGTATATCATGGGTCTTGAATGCCACGTACTTGAAGTTGTTGCTCTGCG ATTTTGCAGCCACCGCAAAGTTTTGTGGCACGATCAGCACTTGTCCCTCTTGCAGCTCCC CATCAAACACTCTATCACCAGTGCAATTCACCACTTGCATCATCGCCCTCCCTTCCAATG CGTATACTATGCTGTTTGCGTTCAGGTTGTAGTGAGGCACGAACATGGCATTCTTGCGGA GAGATCCGAACTGAGCACTGAGTTTGAGGAGCAAGAGGGCTGGGAAGTCAAGGCCGGTGG CGGTTGTAATGCTACCAGCTTGAGGGTTGAAGAAGTCAGGCGATGAAGTTTGACCAATGT TGTGGCGAAGTCTCATTGTGCAAATGGTTTCATCAATGCCATTTCTGCTCTTGCTTTTGC TCTTCTGTGGCTTCTCTTCCTCTTCATCGTCGTCATCTTCTTCCTCTTCCTCTGCTCTCT GTTGCTGCTTTCTCGTTGGTGGAGCTGTCACGCTCAGACCTCCCTCCACTTTCACAATGG CTCCTTTCTCTTCGTCCTCGTTCACACCTTGGAGGTTTTTCACTATCTTCCTGTCCACGT TCAACGCTTGTTCCAAGAATTCTGGGGTGAAGCCACTGAATATGTTGCCGCCTTCATTAT CTTCTTCTTGTTCTTGATGTTGTTTTCCCTTCTGGCTTTGGCTTTGCTGATATTGTACGA ACTCTTGCTCTTGGTTCCCAGCAAGATAGAATCTCCTAGGCATCTGATCGAGCTGGTTCT GTAAGCTGTTGGTGTGAATAAGAGAAACTGCAACAACGGGAGTGTCTTGATTGTTGAACA TCCAGAAAGCAGCACCGGTAGGCACTGCGATCAAATCACCCTCTCTAAAGTGATACACCT TTTGGTGACGGTCTTGAGGCTTCTGGCTTTGTCCTCTTTGAGTTGGCTCTTCAAAAGTCT GAGGACAACCGGAGAAAATGATGCCAAAAATACCACTACCTTGTTGAATGAAGATTTGCT GGGGAGCGTTGGTGAAGAATGGTCTGCGGAGGCCATTGCGTTGGAGGGTGCAGCGAGAGA GGGCAACACCGGCACACTGGAAAGGCTTGCTGTTAGGGTTCCATGTCTCTATGAACCCAC CTTCCGACTCTATACGGTTATCGGGTTTGAGGGCATTCATGCGTTGGAGTTGGCACTCAT ATTCATTTTGCTGTGGCTGCTGTGTCTTATCTTTGCTAGCGAAGCACCCACTCAAAAGCA CAAGACAAAGGGAAAGAGATAGCGCAAGAAGCTTAGCCATGGATATGAATATGATTGATT TGTTTGTGGTGTCCCCCGTACTCTGCGTTGATACCACTGCTTAAGCAGTGGTATCAACGC AGAGTACGGGGGTGGACCCAATGACACCATTTTCATTTATTATTCGGATCATGGTGCTCC TGGTCTTGTCACCATGCCAGTAGGGGGAATATGTCATGGCCAACGATTTTGTGAATGTCT TGAAGAAGAAACATGATGCTAAATCCTACAAAAAGATGGTGATATACTTGGAAGCATGTG AATCTGGGAGCATGTTTGAAGGGATACTACCTAATAACATAAGCATATATGCGACCACAG CTTCCAACGCAGATGAGGATAGTTTTGCATATTATTGTCCTCATTCCTACCCTTCTCCTC CAACTGAGTACACCACTTGTTTGGGAGATGTGTACAGCATTTCGTGGTTAGAAGATAGTG ACAAAAATGACATGACAATAGAAACGCTGCAGCAACAATATGAAACCGTTCGCCGAAGAA CGTTAATTGGTAATGTCGACACCTCTTCTCATGTGAAACAATACGGAGATAGAAAATTCG AGAACGATACTCTTGCTACCTACATTGGTGCACCTGTTAAAACCAACCCCACCAACTCTG CAAATGCATATTCCTTTGAACCATATAGTCCTCAAACTAGACATGTTAGCCAACGAGATG CTCATTTACTCTACCTTAAGCTAGAGTTGCAAAAAGCCCCGGATGGTTCTATGGAAAAGT TGAAAGCTCAAATAGAGTTGGATGATGAAATTGCACATAGGAAGCATTTAGATAGTGTTT TCCATCTCATAGGGGATCTCTTGTTTGGAGAAGAGAATAATATCTCTACCATGTTGCTCC ATGTTCGTCCACCAGGCCAGCCTCTTGTCGATGATTGGGATTGTTTCAAGACCCTTATAA AAACTTACGAGAGCAATTGCGGTAAATTGTCAATCTATGGAAGGAAATACACAAGAGCCT TTGCTAACATGTGCAATGCTGGCATTTCTGAGGAGCAAATGGTAGTAGCCTCTTCACAAG CTTGTCCCAAGGAAAATCCTTCTTAAATTAATTCGTTAAGTTGATAATGTAATAACCAAT ATATATCATGAAAGATTAAAAATTGTGCTTTCATTCTACAAAATGGATTATAATCCTTTG >SRR054580_Asha_rep_c4 TCTCCGACTCAGAAGCAGTGGTATCAACGCAGAGTCTTGGGGAACTGGAATTGACGATCA AGTTGGTCACACCTGTTGCTCCAGCAACATAGTGCAGAAATTGCATGTGTCCAATGTGTA GATCTCTAACAAGATCATAATTATAACATTCTATGTGTAGTTGACTCTTGCTTTTGATTA ACTCCTGCATAGATGTTTCTACCAAAAATGAAAAAAAAAATCATTAATAGATGCATATTG CAGCTAAATTTAGCAGTGAGTTGGTGATACCTCATCCCCCAGTTAGATAAAAGCCACTAG AAGCTGCATTTTCAAATCAACAAGTAGTGATTTATGGCTTCTTTGGGTTTTATGGTGTGT TTTGTAGAAAATTTGTCCTTCATTTTAGCTATGAGCATTCATTGGGTATTGCATAAGTTT TGATGCTATTGTATTGATTTTGATATAAGAAAAGAAAAGTTGTAATGCGTTTGTTTCAAT TATTTTTTTTTAAAGAAATGATATTTTTAACTTGTGGAGAGTTTTAAGAGATTTAGATAA CTTGTAAGGTAACAGATTGTAGAAGTATAAATTACTCTGCCATAAATGAAGCTTTAAGTG CACTACAAGTAAACAACT -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html