Just another simple or dumb question: as you know I am looking for missing regions b/t strains and i have found that quite a few regions listed present in the featuresequences.txt actually have only one read each when i checked them on Gap4 (see below for example). [image: Inline image 1] Can you please advice if they are still good enough to be regarded as present? Thanks again Austen On Mon, Feb 17, 2014 at 6:03 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On 17 Feb 2014, at 5:27 , Austen Chen <cyausten@xxxxxxxxx> wrote: > > [...] however I have instead used convert_project > > Using an older version of MIRA? That program has been renamed > "miraconvert" and remember that the SAM output contains some fixes, too. > > > Also i want to know if the reference has regions that are missing but > not in my strains and i seem to unable to find this piece of information > (after comparing 6 strains to the same reference). > > This is not something which a simple mapping can tell you right away. > However, getting that kind of information is quite trivial. For each of > your strain, do the following: > > 1. Extract all the unmapped reads into a new FASTQ. Use miraconvert / > convert_project with the -n option, giving it the debris info file for that. > 2. Assemble de-novo the reads in this new FASTQ. I recommend that you > assemble these reads not in "genome" mode but in "est" mode. > 3. Analyse the resulting contigs: those not being joins of your mapped > strains (compared with the reference) and having an average coverage of > >=75% of the average coverage of the mapping are something you are looking > for. Basically, every contig with more than a couple of hundred bases (if > using Illumina) is almost a sure hit. > > B. > > > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html >