> On February 19, 2014 at 1:30 PM Austen Chen <cyausten@xxxxxxxxx> wrote: > Just another simple or dumb question: as you know I am looking for missing >regions b/t strains > and i have found that quite a few regions listed present in the > featuresequences.txt actually have > only one read each when i checked them on Gap4 (see below for example). > [...] > Can you please advice if they are still good enough to be regarded as >present? We're speacking of high throughput data, right? I.e. average coverages >=50x For those cases: technically the regions are "present", but I regard them as absent, especially if they are bordering MCVc tags and no known problematic sequencing motif (e.g. multiple GGCxG on fwd/rev strands in Illumina) is in the area. The single matching read or sometimes two or three reads are probably due to a non-100% clonal DNA sample where a couple of individuals have the corresponding stretch. I actually tend to see that a often in mapping assemblies with larger deletions. BTW, tings like these are the reason why I always check by hand results before I give them back to biologists. B.
Attachment:
image.png
Description: PNG image