[mira_talk] Re: Vector screen

  • From: hikaru <hsuenaga@xxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 15 Mar 2011 18:04:43 +0100

On 03/14/11 17:44, Bastien Chevreux wrote:
> Probably yes. You used SSAHA2 and SMALT? You are the second person I know of 
> to do that ... and I'm not sure how undefined the state is when MIRA 
> encounters 
> that (see it as a buggy feature which will be removed from future releases).
>
> To cut things short: do not use SSAHA2 for this task anymore. It was a 
> makeshift and the author itself does not recommend doing that. Go with SMALT 
> (and remove all SSAHA2 output files from the MIRA working directory).
>
> Once you have done that and re-ran the assembly, check whether there are 
> still 
> vector/adaptor remnants. These could be due to two sources: error in SMALT 
> (something which was not recognised) or error in MIRA (wrongly interoreted 
> SMALT results). To check for that, it will be important to find out exatly 
> what 
> happened. For that, have a look at the contigs with contaminant in an 
> assembly 
> finishing tool (gap4, gap5 or consed) and find the read which has the 
> contaminant not clipped. Then look at the SMALT result file whether it is 
> masked there. If not: SMALT problem: if it is: MIRA problem.
>
> In both cases: please report back to the list what you find out so that we 
> can 
> have a look on how to proceed best.
>
> Best,
>   Bastien
Hello, Bastien

Thank you for your suggestion. According to your suggestion, I re-ran
the assembly with SMALT and check the vector remnants. As a result, it
seems that produced contigs still contain vector sequence and this is
from MIRA problem.

Below is a extraction of each run results.

>SMALT_result
alignment:S:00 70    GJMJG5A02F1HEX pCC1Fos       16       85     
8070      8139   F      70 100.00 524
(SMALT certainly recognizes the vector region in GJMJG5A02F1HEX read.)


>GJMJG5A02F1HEX_sequence (a part of full length)
gactacactactcgtTGAACAATGGAAGTCCGAGCTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCAAGGGGTTCGCGTCAGCGGGTGTTGG
(capital sequences indicates vector regions)


>MIRA_result
-------------- Contig statistics ----------------
Sequence:
0            |    .    |    .    |    .    |    .    |    .    |    .   
GJMJG5A02F1HEX+                                                        
                                CAATGGAAGTCCGAG
GJMJG5A02HXZHF+                                                         
                               CAATGGAAGTCCGAG
            ------------------------------------------------------------
Consensus:       
GCCTCTGTCGTTTCCTTTTCTTCTGTTTTTTGTCCGTGGAATGAACAATGGAAGTCCGAG

60            |    .    |    .    |    .    |    .    |    .    |    .   
GJMJG5A02F1HEX+       
CTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCA
GJMJG5A02HXZHF+       
CTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCA
            ------------------------------------------------------------
Consensus:       
CTCATCGCTAATAACTTCGTATAGCATACATTATACGAAGTTATATTCGATGCGGCCGCA

(MIRA results still contained GJMJG5A02F1HEX sequence)

These are in log file.
Merging vector screen data from SMALT results file
AOMppool03_RAW_smaltvectorscreen_in.txt:
 [0%] ....|.... [10%] ....|.... [20%] ....|.... [30%] ....|.... [40%]
....|.... [50%] ....|.... [60%] ....|.... [70%] ....|.... [80%]
....|.... [90%] ....|.... [100%]
Done merging SSAHA2 vector screen data.

Best regards,
Hikaru

Other related posts: