[mira_talk] Re: Position of gaps in homopolymers

  • From: Bastien Chevreux <bach@xxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Wed, 5 Aug 2009 20:52:22 +0200

On Mittwoch 05 August 2009 David Hesselbom wrote:
> comparing Newbler and Mira assemblies it seems that homopolymer sequences
> are left-adjusted (gaps to the right) in the former and right-adjusted
> (gaps to the left) in the latter, e.g.

See?! You surprised me: users will always find things I haven't even dreamt of 
looking for.

Now that you mention it I also see it in the data (and can explain if needed, 
at least for MIRA).

> Newbler:
> AAAAA**
> AAAAAAA
> AAAAAA*
>
> Mira:
> **AAAAA
> AAAAAAA
> *AAAAAA
>
> I would like to know whether this is consistently the case.

In principle: yes. In practice: most of the time.

To be more precise, if if the site in question has problems only due to 
homopolymer, this will always be the case for de-novo assemblies. If other 
factors come into play (a read with some kind of miscallings within the 
homopolymer), then things get difficult. I've attached an image with an 
example of what I mean (have a look at the reads marked in black and how it 
disrupts the homopolymer site).

But these are pretty rare cases.

> Is there any case where, in a Mira assembly, the gaps would be to the right
> of the homopolymer sequence, or are they always to the left, as seems to be
> the case?

On the right probably never, but on special occasions the homopolymer might 
get broken into two halves (see attached image).

> I'm working on a script that counts the number of homopolymers in
> the consensus sequence and then checks the length of each of these
> homopolymers in each read, and I need to know if I have to check both ends
> of the homopolymer for gaps or if just one is enough. Which end to look at
> for gaps would then depend on whether it's a Newbler or a Mira assembly.

As you need a routine for left and right end anyway: check both ends if you 
want to catch everything :-)

Would you mind to make that script publicly available once finished? I love 
3rd party tools to perform QC on my routines, they find things I would never 
check (sometimes I'm too blind).

Regards,
  Bastien

Attachment: bceti2.png
Description: PNG image

Other related posts: