Bastien Chevreux wrote: Hi Bastien, > Dear all, > > for different reasons, implementing a mirabait version aware of paired-end > was not quite straightforward with the code-base of MIRA 4.0.x. However, as > the need for it also arose also for my daily work, I have implemented the > necessary changes in the past weeks. > > I am foreseeing some changes in the default behaviour of mirabait as well as > for the command line. This would probably break scripts using mirabit with > the “old” syntax and I would like some feedback of what people think. Nothing > is implemented yet, so there’s a couple of days to think things through. > > Currently, the default behaviour of mirabait is this: it is not aware of > paired-ends; it reads a file containing bait sequences, reads one or several > files with sequences to search and writes to one(!) output file all sequences > which either a) match on the kmer level the bait sequence or(!) b) the > sequences which do NOT match (the -i option of mirabait). The command line > looks like this atm: > > mirabait [options] {bait_file} {input_file} [[input_file_2 input_file_3 > ...]] {output_basename} I propose renaming mirabait to say mirabait2 to emphasize the different syntax. Just do not stick to the current name, please. > I have a couple of questions: > > 1) atm I plan to disallow writing results in multiple formats at the same > time. E.g., one could not have results written both as FASTQ and FASTA at the > same time (which is possible with the current mirabait). Any problem with > that? No, there are tools to convert FASTQ to FASTA. > 2) would it make sense to allow mirabait read bait sequences from multiple > files? If yes, would it make sense to change the command line so that each > bait file (even if only one is wanted) needs an option like, e.g. > mirabait … -b baitfile1 -b baitfile2 … > As added bonus of a forced ‘-b’: mirabait would stop on old syntax (which > did not have -b) and tell the user to adapt his command. > > 3) I am planning to set up mirabait to act as a file splitter instead of a > file filter. I.e., instead of filtering and writing to an output file only > sequences (not) matching the bait sequences, the new version could sort the > sequences matching to one output file and sequences not matching to another > output file. Default would be to have only the matching output active, but a > switch would allow to either also add the non matching or to write only the > non-matching. I would prefer options like -i (include) and -e (exclude) and -p (prefix). > > 4) Would it make sense to have mirabait write results for each input file > into a separate output file as default? That would enable other tools > (assembler, mappers, whatever) to directly work with Illumina paired-end > which is almost always in two files. The downside: when writing to separate > files, I think it is almost impossible to have the user name every > outputfile. So the default behaviour would be to name the output files like > the input files, but with a given prefix. E.g. “baithits_” for sequences > which matched and “baitmiss_” for sequences which did not. No idea. Martin -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html