[mira_talk] Re: new sff_extract

From: Jose Blanca <jblanca@xxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Wed, 31 Oct 2012 11:04:18 +0100

On 31/10/12 10:48, Peter Cock wrote:

On Wed, Oct 31, 2012 at 9:39 AM, Jose Blanca<jblanca@xxxxxx>  wrote:

Hi:

Sometime ago we discussed in this list the future of sff_extract. We started
working on it and we have a version that we think is working.
The sff_extract functionality has been split in two sff_extract and
split_matepairs that can be linked together with a pipe. We haven't done
extensive testing so if you use them, please let us know.
These utilities are bundled with some other little tools that we have
developed for our day to day work. They are all written in python and they
use biopython.
You can take a look at the development site:

https://github.com/JoseBlanca/seq_crumbs

Or our site:

http://bioinf.comav.upv.es/seq_crumbs/

Of course we'd love to have some feedback.
Best regards,


Hi Jose,

That looks very interesting - I'll forward this to the Biopython
list.


Great, I'm also on the Biopython list.

For those not aware of this, the Biopython SFF code was
based on Jose's original work for sff_extract - then reworked
as part of the Biopython parsing framework, made Python 3
compatible etc.

Jose - Is there anything you found missing in the Biopython
SFF code? For example a public API to get at the low-level
information from an SFF file rather than as Biopython objects?


Not really, because we only write the fastq.

Maybe we should talk about the Biopython API in biopython-devel, buthave had some minor grips with the API (on small details):- when a sequence with no description is read from a file the "nodescription" is added as the description. That's a problem when youwrite the file back. We have work around that by setting the descriptionin that case to be the same as the id. Although in my opinion it wouldbe better to have the option to set the description to None.- It's not possible to modify the seq of a SeqRecord if the SeqRecordhas per_letter_annotations even if the new sequence has the correct length.- The fastq indexers break down with some pair ends files because theyhave repeated ids. We have work around that by modifiying the indexersto work with the whole title lines.

I think that's it, in general Biopython is great and I'm looking forwardto have the new SearchIO and GFF stuff integrated in it.

Best regards,

Jose Blanca

Thanks,

Peter



--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)

--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

References:
- [mira_talk] new sff_extract
  - From: Jose Blanca
- [mira_talk] Re: new sff_extract
  - From: Peter Cock

[mira_talk] Re: new sff_extract

Other related posts: