I'm working on a mapping project where I have two regions of the human genome and hundreds of millions of Illumina reads sequenced against (from multiple samples). I'm breaking up the Illumina reads into manageable chunks and using the Mapping option of Mira to map them against the genomic DNA. The genomic DNA is being read in as a backbone strain.
I'd like to combine results for multiple maps together, but I'm really confused about some fields in the TCS file. First, what is field 8? According to the documentation: "total coverage in number of reads. This number can be higher than the sum of the next five columns if Ns or IUPAC bases are present in the sequence of reads.", However, when I look at an entry for a mapping where there are no reads, just the backbone, this field has a value of 5.
Second, in regions where there are no reads mapped, I'm finding coverages of more than 1, and quality scores for bases that aren't in the reference. Shouldn't the lines corresponding to reference sequences with no reads just have the default quality score for the backbone and coverage of 1 for the base in the corresponding position in the backbone?
Finally, is there any more documentation on the format besides what's in the manual?
Thanks. --Bob Bruccoleri
begin:vcard fn:Robert Bruccoleri n:Bruccoleri;Robert org:Audacious Energy, LLC and Congenomics, LLC adr:;;;;;;USA email;internet:bruc@xxxxxxx title:President version:2.1 end:vcard