[mira_talk] Re: Questions about TCS file fields

  • From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Sun, 20 Mar 2011 09:30:02 -0400

Dear Bastien,
Bastien Chevreux wrote:

On Saturday 19 March 2011 22:01:45 Robert Bruccoleri wrote:

> I'm working on a mapping project where I have two regions of the human

> genome and hundreds of millions of Illumina reads sequenced against

> (from multiple samples). I'm breaking up the Illumina reads into

> manageable chunks and using the Mapping option of Mira to map them

> against the genomic DNA. The genomic DNA is being read in as a backbone

> strain.

Hi Bob,

you're not afraid of anything, are you? Hundreds of millions of reads ... ouch.

The problem partitions, so it's parallelizable both in memory and processing. It's not so bad.

> I'd like to combine results for multiple maps together, but I'm really

> confused about some fields in the TCS file.

I'm not sure I like that approach ... but cannot offer something different with MIRA, indeed.

I have two ideas on the partitioning of the problem: 1) use Bowtie to map reads to a specific genomic coordinate or coordinates, and then use MIRA to map them nicely onto the genome. The partition is done based on genomic coordinates for the map with reads overlapping the junction being mapped to two partitions. Reads which don't map are handled as a separate batch. or 2) just splitting the reads using the Unix 'split' command, mapping each chunk, and recombining the results. I'm currently executing plan 1 above.

Please note: due efforts in rebuilding some internal processes, the current development versions of MIRA or convert_project cannot generate TCS files.

What's the latest version that can? When will this capability be restored?

> First, what is field 8?

> According to the documentation: "total coverage in number of reads. This

> number can be higher than the sum of the next five columns if Ns or

> IUPAC bases are present in the sequence of reads.", However, when I look

> at an entry for a mapping where there are no reads, just the backbone,

> this field has a value of 5.

This seems strange. Are you sure that there are no reads?

Yes, I've looked with both gap4 and clview -- no reads.

> Second, in regions where there are no reads mapped, I'm finding

> coverages of more than 1, and quality scores for bases that aren't in

> the reference. Shouldn't the lines corresponding to reference sequences

> with no reads just have the default quality score for the backbone and

> coverage of 1 for the base in the corresponding position in the backbone?

Agains, this seems extremely strange. Would you have some (smaller) CAF of MAF which shows that?

I'll see if I can send you one.

> Finally, is there any more documentation on the format besides what's in

> the manual?

No. TCS was a test of mine for some specific task. Worked well enough, but I didn't follow up further. What's in the docs is all there is.

B.

Thanks!

Cheers,
Bob


begin:vcard
fn:Robert Bruccoleri
n:Bruccoleri;Robert
org:Audacious Energy, LLC and Congenomics, LLC
adr:;;;;;;USA
email;internet:bruc@xxxxxxx
title:President
version:2.1
end:vcard

Other related posts: