[mira_talk] Re: minimum number of reads to join contigs

From: "Davide Scaglione" <gianza@xxxxxxxxxx>
To: <mira_talk@xxxxxxxxxxxxx>
Date: Thu, 10 Jun 2010 19:00:11 +0200

Thanks a lot for your detailed and helpful reply!

Just to double-check..when you talk about chimera detection, are yourefering to "spoiler detection" -AS:sd??

In this case I think I will do like this; cause I'm using three differentstrain, I will assemble each strain separately with spoiler detectionswitched on.Then I will assemble again the resulting contigs together, with spoilerdetection off...assuming that the contigs are good and real ones...thuseligible to join everything they want.


Is it a good plan?

Regards

Davide

PS: I swear, as soon as I get a final assembly I will stop to bother you!

--------------------------------------------------
From: "Bastien Chevreux" <bach@xxxxxxxxxxxx>
Sent: Thursday, June 10, 2010 6:34 PM
To: <mira_talk@xxxxxxxxxxxxx>
Subject: [mira_talk] Re: minimum number of reads to join contigs

On Donnerstag 10 Juni 2010 Davide Scaglione wrote:
Is there any way to tell MIRA to join reads/build contigs only if acertain
 number of reads is producing the join?
No-can-do.
Making an example with my dataset, I'm assembling 1500000 454-EST;
 expecially for very large contigs,  there are chimera reads that wrongly
 join two different big piled-up chunks of reads,  coming from different
 genes. And this is bad..for annotation and for everything else. Let me
say, a big contig with a 25 x coverage, another contig with 25 xcoverage,
 joined by only one read on the middle...of course a NCBI blastx reveal
 that it's a misassembly.
There may be a way out of the situation, but it's probably associated toloss
of data: chimera detection.

You have this:

r1 xxxxxxxxxxxxxxxx
r2 xxxxxxxxxxxxxxxxx
r3 xxxxxxxxxxxxxxxxx
r4 xxxxxxxxxxxxxxxxxxooooooooooooo
r5                     ooooooooooo
r6                     ooooooooooo
r7                       ooooooooo
with r4 being the chimera. The chimera detection in MIRA works bysearchingfor sequence stretches which are not covered by overlaps. If you now usethe
chimera detection of MIRA, it will almost certainly flag r4 as chimera and
only use a part of it (x or o, depending of which part is longer). Thereis
always a chance that r4 is a valid read though, but that's a risk to take.
Now, that would be totally fine, if one would not have to account forlowly
expressed genes. Imagine this situation:

s1 xxxxxxxxxxxxxxxxx
s2         xxxxxxxxxxxxxxxxxxxxxxxxx
s3                          xxxxxxxxxxxxxxx
Look at s2; from an overlap perspective, s2 could also very well be achimera,leading to a break of an otherwise perfectly valid contig. This is whychimera
detection is switched off by default in MIRA.
Because setting only a fixed integer as parameter might be a problem for
 low-coverage contigs/regions; an idea could be to set a drop-on-coverage
 on which MIRA split contigs....e.g.: on regions were the coverage drop
 under a certain percentage of the average of the contig (or better, of a
 the previous windows of let say, 50 bp).
A similar idea has been for quite some time on my TODO, but I never came
around investigating it further, sorry.
At the moment, the only thing you can do is to write a parser forsearching
these kind of things in a contig, extract the corresponding reads and re-
assemble them with chimera detection switched on.

Regards,
 Bastien

--
You have received this mail because you are subscribed to the mira_talkmailing list. For information on how to subscribe or unsubscribe, pleasevisit http://www.chevreux.org/mira_mailinglists.html


--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Follow-Ups:
- [mira_talk] Re: minimum number of reads to join contigs
  - From: Bastien Chevreux

References:
- [mira_talk] repeat clusters
  - From: bio5yz
- [mira_talk] Re: repeat clusters
  - From: bio5yz
- [mira_talk] minimum number of reads to join contigs
  - From: Davide Scaglione
- [mira_talk] Re: minimum number of reads to join contigs
  - From: Bastien Chevreux

[mira_talk] Re: minimum number of reads to join contigs

Other related posts: