Dear all, for different reasons, implementing a mirabait version aware of paired-end was not quite straightforward with the code-base of MIRA 4.0.x. However, as the need for it also arose also for my daily work, I have implemented the necessary changes in the past weeks. I am foreseeing some changes in the default behaviour of mirabait as well as for the command line. This would probably break scripts using mirabit with the “old” syntax and I would like some feedback of what people think. Nothing is implemented yet, so there’s a couple of days to think things through. Currently, the default behaviour of mirabait is this: it is not aware of paired-ends; it reads a file containing bait sequences, reads one or several files with sequences to search and writes to one(!) output file all sequences which either a) match on the kmer level the bait sequence or(!) b) the sequences which do NOT match (the -i option of mirabait). The command line looks like this atm: mirabait [options] {bait_file} {input_file} [[input_file_2 input_file_3 ...]] {output_basename} I have a couple of questions: 1) atm I plan to disallow writing results in multiple formats at the same time. E.g., one could not have results written both as FASTQ and FASTA at the same time (which is possible with the current mirabait). Any problem with that? 2) would it make sense to allow mirabait read bait sequences from multiple files? If yes, would it make sense to change the command line so that each bait file (even if only one is wanted) needs an option like, e.g. mirabait … -b baitfile1 -b baitfile2 … As added bonus of a forced ‘-b’: mirabait would stop on old syntax (which did not have -b) and tell the user to adapt his command. 3) I am planning to set up mirabait to act as a file splitter instead of a file filter. I.e., instead of filtering and writing to an output file only sequences (not) matching the bait sequences, the new version could sort the sequences matching to one output file and sequences not matching to another output file. Default would be to have only the matching output active, but a switch would allow to either also add the non matching or to write only the non-matching. 4) Would it make sense to have mirabait write results for each input file into a separate output file as default? That would enable other tools (assembler, mappers, whatever) to directly work with Illumina paired-end which is almost always in two files. The downside: when writing to separate files, I think it is almost impossible to have the user name every outputfile. So the default behaviour would be to name the output files like the input files, but with a given prefix. E.g. “baithits_” for sequences which matched and “baitmiss_” for sequences which did not. I’m sure I’ll hit a number of other issues as I progress, but are there any comments regarding the above? Best, Bastien PS: allowing for kmers >32 will *not* be part of the upcoming rework of mirabait (sorry) PPS: for people only on the mira_announce list: please reply to mira_talk -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html