Hi, Bastien Chevreux wrote: > On 13 May 2014, at 14:01 , Sabrina rodriguez > <sabrina.rodriguez@xxxxxxxxxxxxxxxx> wrote: >> In some cases, eventhough I obtain large contigs (> 500bp) as observed from >> the <project>_info_assembly.txt file in the <project>_info directory; in the >> <project>_result directory, no "LargeContigs" files were generated. > > Hello Sabrina, > > I base my answer on the assumption you are using a MIRA 4.x version. If not, > please upgrade. > > There may be a couple of reasons for what you are seeing. Let’s go through: > > 1. A bug in MIRA. Possible, but I’d be a little bit surprised. > > 2. Is the “large contigs” info file in the info directory populated, that is, > does it contain contig names. If yes, then “something” went wrong “somewhere” > when MIRA, after the main assembly, called itself to extract the contigs. > However, I think that this scenario is unlikely as you observe this only “for > some cases / datasets”. > > 3. If the “large contigs” info file in the info directory is not populated, > your data set is … weird, and fools the heuristics which determine what to > consider as “large contig.” This heuristic works like this: during assembly, > MIRA looks at all contigs >= 5kb to determine an average coverage of those > contigs >= 5kb. Then, at the end of the assembly, it defines as “large” > contigs all contigs >= 500bp which have a coverage being at least 50% of the > previously calculated average coverage (33% on projects with a coverage <40x). Um, for EST projects I would propose going for 2000 only. 5kb is too much. Did you say 5kb is for genome assemblies only? ;-) Can one decrease the 50% threshold (for EST projects ...)? > > BTW: you can change the 500 bp and 5000 bp limits via -MI:lcs and -MI:lcs4s > parameters, maybe you want to test lcs=500 and lcs4s=2000. But please read on. > >> In one example, I have obtained contig lengths going from 120 bp to 4543. >> In a second example, I have obtained contigs length between 107 and 9432 bp. > > So, in the first example I can totally understand why MIRA did not extract > any contig as “large” contig: there was none >= 5 kbp to calculate statistics > on, hence no average coverage estimation could be given. In the second > example however I wonder a little bit what kind of other effect prevented at > least the 9kbp contig to be regarded as large. > > However, in case you were not assembling some viral data, your assembly stats > point to some deeper problem: projects with a max contig size of 9 kbp (let > alone 4 kbp) are a total catastrophe. Something feels very wrong there. Most likely bad adapter/primer/artifact removal. ;) Martin -- Martin Mokrejs, PhD. 454 / IonTorrent / Evrogen MINT / Clontech SMART adapter/artifact removal (... too many protocols to name here) http://www.bioinformatics.cz/software/supported-protocols/ -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html