On Montag 03 August 2009 johann JOETS wrote: > maybe I made something wrong using Mira ? Hello Johann, the command line was exactly as it should be for such a test case. > Hope you will have an idea about this. A few. Let's see what's been biting you. My first question: does metasim generate reads in forward and reverse direction? (I never used that program) That might be important later. Second question: did you tell metasim to also simulate paired-ends or did you just simulate an unpaired data generation. You wrote that you'll want to have a look at a piece of a plant genome. Plants tend to be also quite complex in terms of repetitiveness, so going with unpaired sequencing might not be the best strategy there. > [...] > The N50 is as follow : > N50 av cov > 5X 257 0 > 10X 5419 8,95 > 15X 111471 13,74 > 20X 108664 18,29 > 25X 27526 22,21 > 50X 446 40,08 Third question: how did you calculate the N50? Did you do that yourself by just or did you take numbers from the file "*_info_assembly.txt"? If the later, did you take numbers for 'large contigs' or 'all contigs'. If the numbers above are for all contigs, then I'm not so much troubled. You would need to filter the results first to get rid of contig debris. Please also see http://chevreux.org/uploads/media/mira3_usage.html#section_14 (What to do with MIRA result files) regarding this. Now, if the numbers above are for large contigs, then I'm troubled and I'd ask you to make me the sets for 20x to 100x available via FTP if possible. > You may notice that the average coverage is roughly as expected. However I > was surprised by the decrease of n50 for datasets deeper than 15X. This is > also true for the length of the largest contig. > As I know were reads should have been assembled I can check assembly > quality (roughly I count breakages in contigs). According to these tests, > the quality of the assembly also drop down. Now, this is a bit strange. I would expect that starting with 20x, the whole 150kb should be covered ... with 50x certainly more so. I would really like to have a look at those data sets. Regards, Bastien -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html