thanks for your replay, replies below. > Anyway, there are a couple of clues in the information you posted: the > coverage information. "Large contigs" in both strains have a coverage of > ~20x, yet the first strain has a contig with a max coverage of 680x, while > the other strain (the one with the longer assembly time) has a contig with > max coverage of 790x. > > In both cases the fold difference of 34 to 39 (ration between 20x and > 680x/790x) is a lot higher than I am used from "normal" bacteria, that would > be my first angle of attack: what are these high coverage contigs, why does > one strain seem to have a couple more than the other. ah ok, so the 793 contig looks to be just repetitive junk, from contigstats: PA128572_316_1_c1311 1236 23 1378 793 122.73 49.11 4 0 0 0 59 0 PA128572_316_1_rep_c1395 2024 60 1350 108 73.68 65.42 0 0 0 0 11 0 the next contig down in size has a max coverage of only 108 > Second thing to look at: kmer repeat histogram (hash statistics) which you > did not post but can tell you quite a bit. I'm not sure where I would find this information? > Third thing: after the hash statistics, have a look at the read repeat info > file, and there specially the stretches tagged MNRr. They can be quite > informative regarding either sequencing artefacts (some kind of adaptor not > clipped) or really high copy number stretches. the readrepeats file contains 4729 MNRr out of 6548 rows, I can't see any particular pattern though thanks again adam -- You have received this mail because you are subscribed to the mira_talk mailing list. For information on how to subscribe or unsubscribe, please visit http://www.chevreux.org/mira_mailinglists.html