THANK YOU VERY MUCH FOR ANSWER :). It's very basic question! Can ANYBODY tell me how to join contigs in GAP4? Yes, I did set up EVERYTHING, I see them etc. (contigs look really accurate, not many errors etc) but I can make this stuff work for me...in my place nobody have any idea how to do this stuff... How can I join contigs if they don't overlap? Do you know any type of manual... I always hear "by hand"... HOW? THANK YOU! Andrzej Ps. I will try setting you provided. On Thu, Dec 3, 2009 at 1:48 PM, Bastien Chevreux <bach@xxxxxxxxxxxx> wrote: > On Mittwoch 02 Dezember 2009 Andrzej N wrote: > > I need some help... I did *de novo* assembly of several plant > mitochondrial > > genome sequences (454, Titanium, one end reads), about 200000 reads used > > for assembly, this should give me about... 100x coverage). Yes, I know > > overkill, but... MIRA created about 160 contings around 78 quality > score > > (what is it exactly?) (total number of contigs like 5,000 but including > > smaller ones that don’t help much i.e., "junk"). These contigs don't go > > together to create one big consensus contig. > > Hello Adrzej, > > 100x is not only overkill, it also is a bit dangerous for many assemblers > (including MIRA), as there are some unwanted side-effects of ultra-high > coverage. One of them: as sequencing errors are not totally random, they > tend > to accumulate at certain points. If you now have very high coverage, these > sequencing errors will be recognised as valid variants and hence split off > into other contigs. > > Plus you've got plant mitochondrial genomes, and these I've come to fear a > bit. 454 data from those I've seen so far suggest pretty uneven coverage, > which might lead MIRA to have problems if the uniform rad distribution is > used, mistakenly recognising some parts as repeats when they're not. > > > I also did reference assembly, to an already finished and assembled > > sequence. MIRA is covering all of this reference sequence with just only > > one small break (so I get two huge contings about 200000bp each). > > > > Now is the interesting part. When I take these contings from *de novo * > > assembly* *and blast them against the ones generated based on reference > > assembly, they cover the entire sequence very nicely... So, my question > is > > why MIRA is not creating larger contings during *de novo* assembly. These > > contigs are next to each other and show a certain amount of sequence > > overlap (I setup BLAST on my computer to blast the against each other) > but > > MIRA is not seeing this and combining them. > > Oh, MIRA is probably seeing them, but refuses to join because the ends > contain > to many sequencing errors (mistakenly recognised as valid variants) or > because > the ends lay in regions with exceptionally high coverage (mistakenly > recognised as repeat). > > > What parameters in MIRA need to be changed to help build larger contings? > > My adjustment to date have not helped do much more than your default > > settings for "fast assembly". > > Umm ... the 'draft' options are really just that: for drafts. And if you've > got 60kb chunks it's not too bad already. But use at least 'normal' or > 'accurate' mode. > > Now, other things you probably want to do: > 1) decrease sensitivity of repeat marker base recognition. I'd suggest to > add > 454_SETTINGS -CO:mrpg=12 > and see what happens then > 2) eventually assemble without uniform read distribution > -AS:urd=no > and loosen the repeat detection thresholds > 454_SETTINGS -AS:ardct=3:mrl=800 > or switch off repeat detection altogether > -AS:ard=no > > If everything else fails: join the large contigs by hand in 'gap4', just > takes > a couple of minutes for a plant mitochondrion :-) > > > Hope that helps, > Bastien > > -- > You have received this mail because you are subscribed to the mira_talk > mailing list. For information on how to subscribe or unsubscribe, please > visit http://www.chevreux.org/mira_mailinglists.html >