[mira_talk] Re: High GC genomes and mira

  • From: Shaun Tyler <Shaun.Tyler@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 17 May 2011 11:21:43 -0500

Read distribution is a good bet.  I can't speak for solexa data but we've
done a lot of 454 sequencing on various herpesviruses with a GC content
around 70%.  What we find is the coverage is good for the UL and US
regions.  These have a slightly lower GC content than the average.
However, the RL and RS regions which have a higher than average GC content
(up to 80%) tend to be sparsely covered.  Here's my take on what's going
on.  We know from other work that getting PCRs to work in general on these
viruses can be problematic.  The RL/RS regions are particularly
frustrating.  Often extensive optimisation is needed for each and every
target and there are no universal set of conditions that can be applied.
Since both sequencing methods employ PCR steps I think there are just some
regions that fail to amplify and therefore are not represented in the
libraries.

So it may not have anything to do with MIRA at all - it's a bias in the
sequencing techniques.  Just have to wait for direct molecular sequencing
to get around this one ;->


Shaun




From:   Phillip San Miguel <pmiguel@xxxxxxxxxx>
To:     mira_talk@xxxxxxxxxxxxx
Date:   2011-05-17 10:16 AM
Subject:        [mira_talk] High GC genomes and mira
Sent by:        mira_talk-bounce@xxxxxxxxxxxxx



On 5/16/2011 1:17 PM, Bastien Chevreux wrote:
> On May 16, 2011, at 15:16 , Phillip San Miguel wrote:
>
>> I tried MIRA V3.2.1.15 on a 70% GC bacterial genome (Deinococcus) at
>> around 100x coverage with solexa PE 101 base reads. My N50 contig
>> size was 4630 bases. That seems short to me, but it might be a result
>> of the 70% GC. So I decided to de novo assemble a 50% GC data set
>> from the same run.
>
> That's bad, really bad. You are the second report I get that
> apparently, MIRA has problems with high GC Solexa data sets. The first
> being a supersecret bug of a big company, I cannot get the data to see
> what's causing havoc. Would it be possible for me to have a look at
> that thing? No promises, but it might help.
>
> B.
>
     Probably, just let me check with the owner of the sequences.
     However, the short contig lengths may derive from something
trivial: read distribution bias. An Eland/Gerald mapping of our Illumina
Salmonella reads produces a reasonably even coverage depth across the
the genome. A similar mapping of our Illumina Deinococcus reads shows
mostly 50-150x coverage, but also frequent regions with very low
coverage (a few X coverage -- or zero).

--
Phillip

--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html

GIF image

Other related posts: