Hi Bastien, (and the mira-talk-y people Cc-ed), ;) Bastien Chevreux wrote: > On Saturday 05 March 2011 13:46:59 you wrote: >> while trying to rip perfectly 454 B adaptors from my data I came across >> this section >> >> http://mira-assembler.sourceforge.net/docs/DefinitiveGuideToMIRA.html#sect_sanger_using_ssaha2_smalt_to_screen_for_vector_sequence >> >> <quote> >> Note >> I need an example for SMALT .. >> </quote> >> >> Maybe you were looking for >> //www.freelists.org/post/mira_talk/454-cleaning,16 >> >> $ smalt map -f ssaha -d -1 -m 7 idx seqs.fasta > seqs.ssaha_out > > Good find, thank you. I got it updated in the git repository, will be rolled > out wuth the next versions. I haven't tested this for vector screening (large matches) but for 30nt adaptor smalt always crashed on my 32bit linux (reported upstream), matching with ssaha2 was not good either (too many misses), blat seemed better (and of course ;) the initial tests with 'blastall -p blastn -v 999999 -b 999999' from 'legacy' NCBI blast tools was crashing due to too many initial seed hits). At the moment water from EMBOSS package is my favorite, just need to write a parser for that. Notably, I do not like ssaha2 because it does not try to include in the alignment leading nucleotides if say after 3rd there is a gap. The alignment just starts since 4th position. Fiddling with gapopen or gapextension penalties would not help would they be available at all. :( Myself was only able to restrict matches using '-minscore 100 -best 1' though it still gives multiple matches if having same score, even in the same region if I remember right. >> Just am not sure if seqs.ssaha_out or $project_smaltvectorscreen_in.txt or >> $project_ssaha2vectorscreen_in.txt is preferred. (smalt output is in ssaha2 >> format but does mira want to distinguish who did the work or just cares >> about file format?) > > It should just care about the format, but as always, there are slight > differences between the SSAHA2 outzput in sshaha2 fomrat and the SMALT output > in ssaha2 format (*sigh*). Hope you clarified that in the docs if one has to ask smalt to produce ssaha2-like format or if mira can use some other ... ;-) > Therefore: if a *ssha2* named file is present it knows its from SSAHA2, if > it's > named *smalt* it knows its from SMALT. Ah, and if both are present it will > first do the ssaha2, then the smalt. But that's more a side-effect than a > feature ... What will happen when the match regions are partially overlapping? Will their union be used, or the intersection? What if there is no overlap between the two methods? Yeah, the section in the docs about adapter clipping is also unclear, I just wanted to find my own path before commenting on that. So I quit until I have a full proposal. ;) M. -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html