[mira_talk] Re: Filtering out crappy sequence - try my sff2phd

  • From: Markiyan Samborskyy <ms587@xxxxxxxxxxxxxxxxxx>
  • To: Bastien Chevreux <mira_talk@xxxxxxxxxxxxx>
  • Date: Thu, 5 Apr 2012 21:10:37 +0100

Dear Bastien,

Thanks you for reply, hope you'll enjoy it.

BC> Whoa. Nice thing, albeit two questions immediately arise:
BC> - was there a reason not to use / extend sff_extract?
It  was  born in the October 2008, when I was trying to use the 454 paired
end  reads  in the phrap assembly (since 20 Jun 2007 - till Oct 2008 I
was  using and improoving sff2scf, but had problems implemeting linker
seq detection in the pure c), so tried to make a perl version of it.

It's primary purpose was to integrate the sff data properly in the phredPhrap
projetcs.    Then   during   the   evolution: filtering, MID
detection/selection,   import   by   the  reads  inclusion/exclusion
list,  fastq  format output were added...

Anyway - I will have a go at sff_extract at some stage...

BC> - why Q+64? Even Illumina saw the light and moved to the standard Q+33
Historical  reasons,  (since  2007), first chr encoding was Q+64, also
fastX toolkit understands the Q+64 encoding only.
Also  some  programs  don't  like  the Q+33 encoded character strings,
while they work fine with Q+64 ones.

Anyway - will add Q+33 fastq encoding option to the next version.
Also can add writing of the traceinfo.xml file for mira.

PS:
If  anybody  is intrested, I have a phredPhrap project integration tool
phredMira, which does basically the job of phredPhrap, but using MIRA.
It converts phd_dir to mira's input (in ../mira_dir)
sff+ab1+scf files ->phd_dir->*.fasta->*.fasta.screen->
mira's   *_in.fasta + qual+xml (splitting based on the readstechnology)
...run MIRA.....
generate  output  and  fix  the MIRA's ace bugs (read timestamps+etc),
place the resulting ace in the edit_dir.
Now   it  can  be  edited  by  the  consed.  The edits in the reads if
assembly  are  saved and will be picked up by the next phredMira run on the
same    thing.  (just read the latest vesrsions of the *.phd.2,3,4...N
files).   So one can  actualy   FINISH   THINGS,   without   manual
multifasta files creation/merging. 

Have Nice Easter!

-- 
Best regards,
 Markiyan                            mailto:ms587@xxxxxxxxxxxxxxxxxx


-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: