[mira_talk] Questions about TCS file fields

  • From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sat, 19 Mar 2011 17:01:45 -0400

I'm working on a mapping project where I have two regions of the human genome and hundreds of millions of Illumina reads sequenced against (from multiple samples). I'm breaking up the Illumina reads into manageable chunks and using the Mapping option of Mira to map them against the genomic DNA. The genomic DNA is being read in as a backbone strain.


I'd like to combine results for multiple maps together, but I'm really confused about some fields in the TCS file. First, what is field 8? According to the documentation: "total coverage in number of reads. This number can be higher than the sum of the next five columns if Ns or IUPAC bases are present in the sequence of reads.", However, when I look at an entry for a mapping where there are no reads, just the backbone, this field has a value of 5.

Second, in regions where there are no reads mapped, I'm finding coverages of more than 1, and quality scores for bases that aren't in the reference. Shouldn't the lines corresponding to reference sequences with no reads just have the default quality score for the backbone and coverage of 1 for the base in the corresponding position in the backbone?

Finally, is there any more documentation on the format besides what's in the manual?

Thanks.

--Bob Bruccoleri




begin:vcard
fn:Robert Bruccoleri
n:Bruccoleri;Robert
org:Audacious Energy, LLC and Congenomics, LLC
adr:;;;;;;USA
email;internet:bruc@xxxxxxx
title:President
version:2.1
end:vcard

Other related posts: