Thanks a lot for your detailed and helpful reply!Just to double-check..when you talk about chimera detection, are you refering to "spoiler detection" -AS:sd??
In this case I think I will do like this; cause I'm using three different strain, I will assemble each strain separately with spoiler detection switched on. Then I will assemble again the resulting contigs together, with spoiler detection off...assuming that the contigs are good and real ones...thus eligible to join everything they want.
Is it a good plan? Regards Davide PS: I swear, as soon as I get a final assembly I will stop to bother you! -------------------------------------------------- From: "Bastien Chevreux" <bach@xxxxxxxxxxxx> Sent: Thursday, June 10, 2010 6:34 PM To: <mira_talk@xxxxxxxxxxxxx> Subject: [mira_talk] Re: minimum number of reads to join contigs
On Donnerstag 10 Juni 2010 Davide Scaglione wrote:Is there any way to tell MIRA to join reads/build contigs only if a certainnumber of reads is producing the join?No-can-do.Making an example with my dataset, I'm assembling 1500000 454-EST; expecially for very large contigs, there are chimera reads that wrongly join two different big piled-up chunks of reads, coming from different genes. And this is bad..for annotation and for everything else. Let mesay, a big contig with a 25 x coverage, another contig with 25 x coverage,joined by only one read on the middle...of course a NCBI blastx reveal that it's a misassembly.There may be a way out of the situation, but it's probably associated to lossof data: chimera detection. You have this: r1 xxxxxxxxxxxxxxxx r2 xxxxxxxxxxxxxxxxx r3 xxxxxxxxxxxxxxxxx r4 xxxxxxxxxxxxxxxxxxooooooooooooo r5 ooooooooooo r6 ooooooooooo r7 ooooooooowith r4 being the chimera. The chimera detection in MIRA works by searching for sequence stretches which are not covered by overlaps. If you now use thechimera detection of MIRA, it will almost certainly flag r4 as chimera andonly use a part of it (x or o, depending of which part is longer). There isalways a chance that r4 is a valid read though, but that's a risk to take.Now, that would be totally fine, if one would not have to account for lowlyexpressed genes. Imagine this situation: s1 xxxxxxxxxxxxxxxxx s2 xxxxxxxxxxxxxxxxxxxxxxxxx s3 xxxxxxxxxxxxxxxLook at s2; from an overlap perspective, s2 could also very well be a chimera, leading to a break of an otherwise perfectly valid contig. This is why chimeradetection is switched off by default in MIRA.Because setting only a fixed integer as parameter might be a problem for low-coverage contigs/regions; an idea could be to set a drop-on-coverage on which MIRA split contigs....e.g.: on regions were the coverage drop under a certain percentage of the average of the contig (or better, of a the previous windows of let say, 50 bp).A similar idea has been for quite some time on my TODO, but I never came around investigating it further, sorry.At the moment, the only thing you can do is to write a parser for searchingthese kind of things in a contig, extract the corresponding reads and re- assemble them with chimera detection switched on. Regards, Bastien --You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html
-- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html