[mira_talk] Metagenomic assembly

  • From: Chayan Roy <chayan.roy93@xxxxxxxxx>
  • To: "mira_talk@xxxxxxxxxxxxx" <mira_talk@xxxxxxxxxxxxx>
  • Date: Wed, 14 May 2014 10:14:54 +0530

Currently i am using MiraV4.0..and as i have not used abnc after switching
to this version i did not know this..thanks.

I want to get the stat of read info used to map those six contigs and also
want to extract the read pool for each contig so that i can run denovo for
all of them separately..should i use miraconvert for this?

I dont know how to  perform shredding and partial ordering of
contigs..sorry..can u make it little easier please.


Thanks for all the helps.

Regards
On Monday, May 12, 2014, Torben Nielsen <torben@xxxxxxxxxx> wrote:
>
>> On 09 May 2014, at 23:21 , Chayan Roy <chayan.roy93@xxxxxxxxx> wrote:
>>> I am using six iontorrent data (avg read length 195bp) and four proton
data (~178bp). for all data i have performed denovo assembly with default
parameters. But after looking at the contig_stat_pass1.txt after the
assembly, i rerun it with -AS:nop=1 (well i know this was a really weird
things to do) but this was resulted in contigs of much longer size (in that
case i might sacrifice the accuracy, is it so??)
>>
>> You’re not sacrificing accuracy … you’re completely butchering it. Your
“long” contigs will have a lot of misassembles.
>
> I have run about 25 large metagenomic assemblies this year. The smaller
ones are newer MiSeq full runs with 25M read pairs while some of the larger
ones are HiSeq with about twice the data. I run 6 passes which seems to be
what it takes for the number of contigs to “stabilize”. I asked Bastien
about that some time ago and as I recall, he commented that he’d used
mostly up to 4 (it’s been a while, but that’s what I remember). I got a
version that logs the number of contigs broken in a pass and I played with
it for a while and settled on 6 as being a good compromise. I tried up to 8.
>
> I need to stop looking at the first pass contigs. It makes me cry when I
see almost 1M long contigs in the first pass and I *know* I’m lucky to have
250K left at the end of the 6th pass. My conclusion for metagenomics is to
not look at the contig lengths till the end of the 3rd pass. Or just keep a
bottle of wine handy and drown your sorrows. My passes take two days a
piece so there’s plenty of time to sober up.
>
> If you really really want longer contigs and are aware of the dangers,
consider shredding your contigs and reassembling. I have done that and I
have gotten significantly longer contigs out of it. In effect, I am
equalizing coverage this way. That said, I gave up on it and decided to go
for analytical approaches that work fine on long contigs and do not require
fully assembled genomes. In much of what I am working on, the species
question isn’t easily answerable anyway.
>
> Another possible approach is to play with partial ordering of your
contigs. That leads to graphs and you can look at paths through the graph
to find potentially much longer ones. I’ve put in a fair amount of work
doing that and I can get very long contigs, but I am not so sure what it
means.
>
> Torben
>
>
> --
> You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html
>

-- 
*CHAYAN ROY*

Other related posts: