[mira_talk] Re: Request for Comments: mirabait for paired-end

  • From: Martin MOKREJŠ <mmokrejs@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 25 Jun 2014 17:41:46 +0200

Bastien Chevreux wrote:
> 
> First things first: I have a prototype working, I expect bugs though for 
> fringe use cases.
> 
>   
> http://www.chevreux.org/tmp/mira_binonly_ft_baitpe-0-g76dd2b2_linux-gnu_x86_64_static.tar.bz2
> 
> No docs, but “mirabait -h” should help a lot. Feel free to test drive.
> 
> I’ll combine answers to Martin and Peter here.
> 
> On 24 Jun 2014, at 15:52 , Martin MOKREJŠ <mmokrejs@xxxxxxxxx> wrote:
>> I propose renaming mirabait to say mirabait2 to emphasize the different 
>> syntax. Just do not stick to the current name, please.
> 
> I’m not really fond of that idea. mirabait2 would be in the package of MIRA 
> 4, but then only since >4.0.2. Or should I rename it mirabait4? Then where 
> would versions 2 and 3 be? That could be slightly unsettling for users.

I thought I should better propose mirabait4. ;) Honestly, who cares about the 
new name?
But lots of people will care like they do because hmmer-2.x is different from 
hmmer-3.x
while many people realzie tehy do different things, hmmer-3 is not a complete 
replacement
of of v2 series, it is a nightmare.

Just today it came through this list that sff_extract is not always the same 
and it is a hell
to untie the differences backwards. BTW, mira manual has has two places showing 
how 454-paired-end
data are to be specified, no sign of changes during the years.

So, my recommendation is not to re-use the old mirabait name. The rest is up to 
you.


>>> 3) I am planning to set up mirabait to act as a file splitter instead of a 
>>> file filter. I.e., instead of filtering and writing to an output file only 
>>> sequences (not) matching the bait sequences, the new version could sort the 
>>> sequences matching to one output file and sequences not matching to another 
>>> output file. Default would be to have only the matching output active, but 
>>> a switch would allow to either also add the non matching or to write only 
>>> the non-matching.
>>
>> I would prefer options like -i (include) and -e (exclude) and -p (prefix).

Seems I mis-understood you. Provided it is a filter=splitter, it doesn't make 
sense now.
I thought it will extract reads matching one set (those to be included 
sequences) and
ensure those from the (exclude) list will be omitted. But that is not what you 
are coding.

Martin

-- 
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: