On 03/14/11 17:44, Bastien Chevreux wrote: > Probably yes. You used SSAHA2 and SMALT? You are the second person I know of > to do that ... and I'm not sure how undefined the state is when MIRA > encounters > that (see it as a buggy feature which will be removed from future releases). > > To cut things short: do not use SSAHA2 for this task anymore. It was a > makeshift and the author itself does not recommend doing that. Go with SMALT > (and remove all SSAHA2 output files from the MIRA working directory). > > Once you have done that and re-ran the assembly, check whether there are > still > vector/adaptor remnants. These could be due to two sources: error in SMALT > (something which was not recognised) or error in MIRA (wrongly interoreted > SMALT results). To check for that, it will be important to find out exatly > what > happened. For that, have a look at the contigs with contaminant in an > assembly > finishing tool (gap4, gap5 or consed) and find the read which has the > contaminant not clipped. Then look at the SMALT result file whether it is > masked there. If not: SMALT problem: if it is: MIRA problem. > > In both cases: please report back to the list what you find out so that we > can > have a look on how to proceed best. > > Best, > Bastien Hello, Bastien Thank you for your suggestion. According to your suggestion, I re-ran the assembly with SMALT and check the vector remnants. As a result, it seems that produced contigs still contain vector sequence and this is from MIRA problem. Below is a extraction of each run results. >SMALT_result alignment:S:00 70 GJMJG5A02F1HEX pCC1Fos 16 85 8070 8139 F 70 100.00 524 (SMALT certainly recognizes the vector region in GJMJG5A02F1HEX read.) >GJMJG5A02F1HEX_sequence (a part of full length) gactacactactcgtTGAACAATGGAAGTCCGAGCTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCAAGGGGTTCGCGTCAGCGGGTGTTGG (capital sequences indicates vector regions) >MIRA_result -------------- Contig statistics ---------------- Sequence: 0 | . | . | . | . | . | . GJMJG5A02F1HEX+ CAATGGAAGTCCGAG GJMJG5A02HXZHF+ CAATGGAAGTCCGAG ------------------------------------------------------------ Consensus: GCCTCTGTCGTTTCCTTTTCTTCTGTTTTTTGTCCGTGGAATGAACAATGGAAGTCCGAG 60 | . | . | . | . | . | . GJMJG5A02F1HEX+ CTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCA GJMJG5A02HXZHF+ CTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCA ------------------------------------------------------------ Consensus: CTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCA (MIRA results still contained GJMJG5A02F1HEX sequence) These are in log file. Merging vector screen data from SMALT results file AOMppool03_RAW_smaltvectorscreen_in.txt: [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%] ....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%] ....|.... [90%] ....|.... [100%] Done merging SSAHA2 vector screen data. Best regards, Hikaru