[mira_talk] Re: Questions about TCS file fields

From: Robert Bruccoleri <bruc@xxxxxxxxxxxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Sun, 20 Mar 2011 09:30:02 -0400

Dear Bastien,
Bastien Chevreux wrote:


On Saturday 19 March 2011 22:01:45 Robert Bruccoleri wrote:

> I'm working on a mapping project where I have two regions of the human

> genome and hundreds of millions of Illumina reads sequenced against

> (from multiple samples). I'm breaking up the Illumina reads into

> manageable chunks and using the Mapping option of Mira to map them

> against the genomic DNA. The genomic DNA is being read in as a backbone

> strain.

Hi Bob,

you're not afraid of anything, are you? Hundreds of millions of reads... ouch.

The problem partitions, so it's parallelizable both in memory andprocessing. It's not so bad.

> I'd like to combine results for multiple maps together, but I'm really

> confused about some fields in the TCS file.
I'm not sure I like that approach ... but cannot offer somethingdifferent with MIRA, indeed.

I have two ideas on the partitioning of the problem: 1) use Bowtie tomap reads to a specific genomic coordinate or coordinates, and then useMIRA to map them nicely onto the genome. The partition is done based ongenomic coordinates for the map with reads overlapping the junctionbeing mapped to two partitions. Reads which don't map are handled as aseparate batch. or 2) just splitting the reads using the Unix 'split'command, mapping each chunk, and recombining the results. I'm currentlyexecuting plan 1 above.

Please note: due efforts in rebuilding some internal processes, thecurrent development versions of MIRA or convert_project cannotgenerate TCS files.

What's the latest version that can? When will this capability be restored?


> First, what is field 8?

> According to the documentation: "total coverage in number of reads. This

> number can be higher than the sum of the next five columns if Ns or

> IUPAC bases are present in the sequence of reads.", However, when I look

> at an entry for a mapping where there are no reads, just the backbone,

> this field has a value of 5.

This seems strange. Are you sure that there are no reads?

Yes, I've looked with both gap4 and clview -- no reads.

> Second, in regions where there are no reads mapped, I'm finding

> coverages of more than 1, and quality scores for bases that aren't in

> the reference. Shouldn't the lines corresponding to reference sequences

> with no reads just have the default quality score for the backbone and
> coverage of 1 for the base in the corresponding position in thebackbone?
Agains, this seems extremely strange. Would you have some (smaller)CAF of MAF which shows that?

I'll see if I can send you one.

> Finally, is there any more documentation on the format besides what's in

> the manual?
No. TCS was a test of mine for some specific task. Worked well enough,but I didn't follow up further. What's in the docs is all there is.
B.

Thanks!

Cheers,
Bob

begin:vcard
fn:Robert Bruccoleri
n:Bruccoleri;Robert
org:Audacious Energy, LLC and Congenomics, LLC
adr:;;;;;;USA
email;internet:bruc@xxxxxxx
title:President
version:2.1
end:vcard

Follow-Ups:
- [mira_talk] Re: Questions about TCS file fields
  - From: Bastien Chevreux

References:
- [mira_talk] Questions about TCS file fields
  - From: Robert Bruccoleri
- [mira_talk] Re: Questions about TCS file fields
  - From: Bastien Chevreux

[mira_talk] Re: Questions about TCS file fields

Other related posts: