Nevermind, I forgot to turn on. -CL:msvs=on That'll help!!!! On 11/18/10, Robin Kramer <kodream@xxxxxxxxx> wrote: > So after I do the assembly with the SSAHA2 mappings. > > When I grep through the file I still find thousands of contigs > containing the vector. > > Would it be more appropriate totally clip the vectors first, so that > mira doesn't have to worry about these things? > > Sincerely yours, > > Robin > > On 11/18/10, Robin Kramer <kodream@xxxxxxxxx> wrote: >> I can track down the forward primers pretty easy. >> >> they are hard to miss considering 1/4 of the reads start with. >> >> tcagAAGCAGTGGTATCAACGCAGAGTACGGGGG >> >> I think that GGGGG at the end is the leading G problem, hypothetically >> due to linker chemistry. >> >> I have tried it with four G homopolymers five G homopolymers and six G >> homopolymers, and it seems there is a significant drop at the sixth. >> >> Is there any problem with feeding that into the SSAHA2? >> >> Also I found the reverse complement of this sequence, hence our chimeras. >> >> Is there potentially a B adapter floating around in the mix? I didn't >> obviously see another adapter in the sequence. >> >> Also FYI, the adapter was an exact match for many sequences in the TSA. >> >> Sincerely yours, >> >> Robin >> >> On 11/17/10, Robin Kramer <kodream@xxxxxxxxx> wrote: >>> It seems the sff's are available from the SRA, but only through the .SRA >>> file. >>> >>> Robin >>> >>> On 11/12/10, Gao, Guangtu <Guangtu.Gao@xxxxxxxxxxxx> wrote: >>>> Hi Robin, >>>> >>>> You might consider to check the adaptors and contaminants using >>>> SeqTrim. >>>> I also downloaded some EST sequences from NCBI for assembly before. I >>>> found that the adaptors are not totally cleaned from those sequences >>>> and >>>> they made chimeras. >>>> >>>> Guang >>>> >>>> -----Original Message----- >>>> From: mira_talk-bounce@xxxxxxxxxxxxx >>>> [mailto:mira_talk-bounce@xxxxxxxxxxxxx] On Behalf Of Robin Kramer >>>> Sent: Thursday, November 11, 2010 12:44 PM >>>> To: mira_talk@xxxxxxxxxxxxx >>>> Subject: [mira_talk] 454 cleaning >>>> >>>> Hi, >>>> >>>> I am doing an assembly from data publicly available at NCBI. >>>> >>>> The data are available here: >>>> >>>> http://www.ncbi.nlm.nih.gov/sra/SRX021565?report=full >>>> >>>> It is 454 data, but unfortunately neither the sff or xml files are >>>> available. >>>> >>>> I assembled the data with Mira, using the no xml flags. >>>> >>>> Which appeared to give a nice assembly. >>>> >>>> However when I BLASTN and BLASTX the first and forth contig their >>>> appear to be problems with the data. >>>> >>>> With the first contig, when I blastx it, it gives strong hits to two >>>> different genes on different ends, as if it were a chimera. When I >>>> blastn the sequence I get a strong hit on one side, then in the middle >>>> I get section with multiple hits to different species in the TSA. >>>> When I look at the pileup, there is a thin place in the gene an a huge >>>> drop off in coverage from one side. >>>> >>>> I think this appears to be due to a adapter trimming problem with the >>>> 454 data. >>>> >>>> The fourth contig when blastN has a very strong gene hit from a >>>> closely related species, but at the end has another small stretch that >>>> matches many other sequences in TSA that are distantly related. The >>>> adapter looking portion has small coverage with a giant change in >>>> coverage in the strong region. >>>> >>>> To me it appears as if some of the adapters are consistently not >>>> getting trimmed(in this set and in TSA). >>>> >>>> Here is a relevant thread in seqanswers. >>>> http://seqanswers.com/forums/showthread.php?t=3462 >>>> As well as a link out to previous discussions in this list. >>>> //www.freelists.org/post/mira_talk/454-adaptor-clipping >>>> >>>> Is there any consensus on recleaning the 454 adapters? I don't even >>>> know what the sequences would be to expect. >>>> >>>> The assemblies of the two contigs are pasted after this message. >>>> >>>> Sincerely yours, >>>> >>>> Robin >>>> >>>>>SRR054580_Asha_rep_c1 >>>> AGTTTCTTAACACTTGGACCAATATATTATTTTCCTTTGTTTTGCAAGAAGGATAAAAGA >>>> AAGAAACAAASGWMDAAAAGAGTTTACTAGAAAACTCATCGAGCTAGTTTCTCCACTTAT >>>> TTATTTTATGCTTTTCCCGCAAAAGTTTGGTGACTCATACATGAGGATAGATACACATAG >>>> ACTCACGTTATTTTACACACGTATATATATAAGGAAAGGCAGGCTAAGCCTTTGATTTAT >>>> TTGATTATTGATCCGCGCACTATTGGCAAAAAGACAGTAGTGGGGTAGCACAGCAGATGC >>>> AAAAAGATGAAGCATAGCTCTAAGCCACATATCTCATTTGAGAGTGACGAGGAGGTGGAA >>>> CGAGAAAGTTGAAAGGGTTGTTGTTCTTGATCTGCCTGGCCTGTTCGCTATGGAGGTTGA >>>> AAGTGTGTTGAATGACTTCCTCAGGCAATGCGTTTAACAAAGAGTTTCTACCTGCAAGTG >>>> TGCCAGTCACAGGTATATCATGGGTCTTGAATGCCACGTACTTGAAGTTGTTGCTCTGCG >>>> ATTTTGCAGCCACCGCAAAGTTTTGTGGCACGATCAGCACTTGTCCCTCTTGCAGCTCCC >>>> CATCAAACACTCTATCACCAGTGCAATTCACCACTTGCATCATCGCCCTCCCTTCCAATG >>>> CGTATACTATGCTGTTTGCGTTCAGGTTGTAGTGAGGCACGAACATGGCATTCTTGCGGA >>>> GAGATCCGAACTGAGCACTGAGTTTGAGGAGCAAGAGGGCTGGGAAGTCAAGGCCGGTGG >>>> CGGTTGTAATGCTACCAGCTTGAGGGTTGAAGAAGTCAGGCGATGAAGTTTGACCAATGT >>>> TGTGGCGAAGTCTCATTGTGCAAATGGTTTCATCAATGCCATTTCTGCTCTTGCTTTTGC >>>> TCTTCTGTGGCTTCTCTTCCTCTTCATCGTCGTCATCTTCTTCCTCTTCCTCTGCTCTCT >>>> GTTGCTGCTTTCTCGTTGGTGGAGCTGTCACGCTCAGACCTCCCTCCACTTTCACAATGG >>>> CTCCTTTCTCTTCGTCCTCGTTCACACCTTGGAGGTTTTTCACTATCTTCCTGTCCACGT >>>> TCAACGCTTGTTCCAAGAATTCTGGGGTGAAGCCACTGAATATGTTGCCGCCTTCATTAT >>>> CTTCTTCTTGTTCTTGATGTTGTTTTCCCTTCTGGCTTTGGCTTTGCTGATATTGTACGA >>>> ACTCTTGCTCTTGGTTCCCAGCAAGATAGAATCTCCTAGGCATCTGATCGAGCTGGTTCT >>>> GTAAGCTGTTGGTGTGAATAAGAGAAACTGCAACAACGGGAGTGTCTTGATTGTTGAACA >>>> TCCAGAAAGCAGCACCGGTAGGCACTGCGATCAAATCACCCTCTCTAAAGTGATACACCT >>>> TTTGGTGACGGTCTTGAGGCTTCTGGCTTTGTCCTCTTTGAGTTGGCTCTTCAAAAGTCT >>>> GAGGACAACCGGAGAAAATGATGCCAAAAATACCACTACCTTGTTGAATGAAGATTTGCT >>>> GGGGAGCGTTGGTGAAGAATGGTCTGCGGAGGCCATTGCGTTGGAGGGTGCAGCGAGAGA >>>> GGGCAACACCGGCACACTGGAAAGGCTTGCTGTTAGGGTTCCATGTCTCTATGAACCCAC >>>> CTTCCGACTCTATACGGTTATCGGGTTTGAGGGCATTCATGCGTTGGAGTTGGCACTCAT >>>> ATTCATTTTGCTGTGGCTGCTGTGTCTTATCTTTGCTAGCGAAGCACCCACTCAAAAGCA >>>> CAAGACAAAGGGAAAGAGATAGCGCAAGAAGCTTAGCCATGGATATGAATATGATTGATT >>>> TGTTTGTGGTGTCCCCCGTACTCTGCGTTGATACCACTGCTTAAGCAGTGGTATCAACGC >>>> AGAGTACGGGGGTGGACCCAATGACACCATTTTCATTTATTATTCGGATCATGGTGCTCC >>>> TGGTCTTGTCACCATGCCAGTAGGGGGAATATGTCATGGCCAACGATTTTGTGAATGTCT >>>> TGAAGAAGAAACATGATGCTAAATCCTACAAAAAGATGGTGATATACTTGGAAGCATGTG >>>> AATCTGGGAGCATGTTTGAAGGGATACTACCTAATAACATAAGCATATATGCGACCACAG >>>> CTTCCAACGCAGATGAGGATAGTTTTGCATATTATTGTCCTCATTCCTACCCTTCTCCTC >>>> CAACTGAGTACACCACTTGTTTGGGAGATGTGTACAGCATTTCGTGGTTAGAAGATAGTG >>>> ACAAAAATGACATGACAATAGAAACGCTGCAGCAACAATATGAAACCGTTCGCCGAAGAA >>>> CGTTAATTGGTAATGTCGACACCTCTTCTCATGTGAAACAATACGGAGATAGAAAATTCG >>>> AGAACGATACTCTTGCTACCTACATTGGTGCACCTGTTAAAACCAACCCCACCAACTCTG >>>> CAAATGCATATTCCTTTGAACCATATAGTCCTCAAACTAGACATGTTAGCCAACGAGATG >>>> CTCATTTACTCTACCTTAAGCTAGAGTTGCAAAAAGCCCCGGATGGTTCTATGGAAAAGT >>>> TGAAAGCTCAAATAGAGTTGGATGATGAAATTGCACATAGGAAGCATTTAGATAGTGTTT >>>> TCCATCTCATAGGGGATCTCTTGTTTGGAGAAGAGAATAATATCTCTACCATGTTGCTCC >>>> ATGTTCGTCCACCAGGCCAGCCTCTTGTCGATGATTGGGATTGTTTCAAGACCCTTATAA >>>> AAACTTACGAGAGCAATTGCGGTAAATTGTCAATCTATGGAAGGAAATACACAAGAGCCT >>>> TTGCTAACATGTGCAATGCTGGCATTTCTGAGGAGCAAATGGTAGTAGCCTCTTCACAAG >>>> CTTGTCCCAAGGAAAATCCTTCTTAAATTAATTCGTTAAGTTGATAATGTAATAACCAAT >>>> ATATATCATGAAAGATTAAAAATTGTGCTTTCATTCTACAAAATGGATTATAATCCTTTG >>>> >>>>>SRR054580_Asha_rep_c4 >>>> TCTCCGACTCAGAAGCAGTGGTATCAACGCAGAGTCTTGGGGAACTGGAATTGACGATCA >>>> AGTTGGTCACACCTGTTGCTCCAGCAACATAGTGCAGAAATTGCATGTGTCCAATGTGTA >>>> GATCTCTAACAAGATCATAATTATAACATTCTATGTGTAGTTGACTCTTGCTTTTGATTA >>>> ACTCCTGCATAGATGTTTCTACCAAAAATGAAAAAAAAAATCATTAATAGATGCATATTG >>>> CAGCTAAATTTAGCAGTGAGTTGGTGATACCTCATCCCCCAGTTAGATAAAAGCCACTAG >>>> AAGCTGCATTTTCAAATCAACAAGTAGTGATTTATGGCTTCTTTGGGTTTTATGGTGTGT >>>> TTTGTAGAAAATTTGTCCTTCATTTTAGCTATGAGCATTCATTGGGTATTGCATAAGTTT >>>> TGATGCTATTGTATTGATTTTGATATAAGAAAAGAAAAGTTGTAATGCGTTTGTTTCAAT >>>> TATTTTTTTTTAAAGAAATGATATTTTTAACTTGTGGAGAGTTTTAAGAGATTTAGATAA >>>> CTTGTAAGGTAACAGATTGTAGAAGTATAAATTACTCTGCCATAAATGAAGCTTTAAGTG >>>> CACTACAAGTAAACAACT >>>> >>>> -- >>>> You have received this mail because you are subscribed to the mira_talk >>>> mailing list. For information on how to subscribe or unsubscribe, >>>> please >>>> visit http://www.chevreux.org/mira_mailinglists.html >>>> >>>> >>>> -- >>>> You have received this mail because you are subscribed to the mira_talk >>>> mailing list. For information on how to subscribe or unsubscribe, >>>> please >>>> visit http://www.chevreux.org/mira_mailinglists.html >>>> >>> >> > -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html