[mira_talk] Re: 454 cleaning

  • From: Laurent MANCHON <lmanchon@xxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 20 Nov 2010 16:19:39 +0100

yes i confirm that SMALT is more valuable and faster than SSAHA2.

laurent --

Bastien Chevreux a écrit :
On Freitag 19 November 2010 Robin Kramer wrote:
Well it did make a difference in the statistics between the two runs.
[...]
However when I grep through I still find the vector sequence, which I
don't believe should be happening.
[...]
It is definitely using the -CL settings.

Then I suppose that not all vectors were found and marked by SSAHA2. I've noticed that SSAHA2 is not always finding everything and I got reports from other people stating the same. Which is bothersome.

To find out which reads still are contaminated, you can do the following. go into the MIRA checkpoint directory and convert the checkpoint file (readpool.maf) into a clipped FASTA file using convert_project:

  convert_project -f maf -t clippedfasta readpool.maf myclippedseqs

Then search in the resulting FASTA file for reads with the adaptor. Once you found a couple, look up these reads in the SSAHA2 clipping file whether you find them. If not, SSAHA2 goofed. If yes, check where SSAHA2 found your adaptors (maybe in the middle of the read?) and maybe then adapt the -CL:mcvs* parameters to account for that. Or through out those reads from your data file.

Perhaps you would want to try SMALT? (MIRA 3.2.1rc2 should be able to read the results of it).

B.


Other related posts: