[mira_talk] Re: 454 cleaning

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 20 Nov 2010 16:04:36 +0100

On Freitag 19 November 2010 Robin Kramer wrote:
> Well it did make a difference in the statistics between the two runs.
> [...]
> However when I grep through I still find the vector sequence, which I
> don't believe should be happening.
> [...]
> It is definitely using the -CL settings.

Then I suppose that not all vectors were found and marked by SSAHA2. I've 
noticed that SSAHA2 is not always finding everything and I got reports from 
other people stating the same. Which is bothersome.

To find out which reads still are contaminated, you can do the following. go 
into the MIRA checkpoint directory and convert the checkpoint file 
(readpool.maf) into a clipped FASTA file using convert_project:

  convert_project -f maf -t clippedfasta readpool.maf myclippedseqs

Then search in the resulting FASTA file for reads with the adaptor. Once you 
found a couple, look up these reads in the SSAHA2 clipping file whether you 
find them. If not, SSAHA2 goofed. If yes, check where SSAHA2 found your 
adaptors (maybe in the middle of the read?) and maybe then adapt the -CL:mcvs* 
parameters to account for that. Or through out those reads from your data 
file.

Perhaps you would want to try SMALT? (MIRA 3.2.1rc2 should be able to read the 
results of it).

B.

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: