[mira_talk] Re: Need suggestion on quality of NGS data and post assembly procedures

  • From: Andrej Benjak <abenjak@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 16 Jun 2015 13:43:04 +0200

The latest (develpment) version of MIRA is 4.9.5:
http://sourceforge.net/projects/mira-assembler/files/MIRA/development/

To disable to high coverage warning:
parameters = COMMON_SETTINGS -NW:cac=warn


Andrej

On 06/16/2015 01:36 PM, Rameez Mj wrote:


I am Using MIRA 4.0.2 version which is latest.
I am doing the alignment of contigs.
Is it possible to disable the warning(Very high average coverage)? If so how to do that? any specific command for that?

On 16 June 2015 at 14:37, Andrej Benjak <abenjak@xxxxxxxxx <mailto:abenjak@xxxxxxxxx>> wrote:

Hi Rameez,

First, I would try the newest version of MIRA, hopefully the
assembly would improve.
I am not sure if it is necessary to remove PCR duplicates. This
helps in extreme cases, like with libraries enriched using DNA
array capture, etc. In your case, Prinseq reports 1% duplicates,
which is expected given the coverage.
I would worry about the genome coverage distribution since your
bug seems extremely GC rich. Check the alignments of the largest
contigs or make some coverage files. If the coverage happens to go
very high-very low in a frenzy manner then you cannot expect a
perfect de novo assembly but you can try assembling it in EST mode
(you risk more misaemblies, so do not naively do synteny analyses,
but the genic part should be assembled in longer contigs).
Also, because of the high GC% and possibly very low coverage areas
and more sequencing errors, try another assembly with the complete
dataset (disable MIRA's warning on high coverage).

Cheers,
Andrej

On 06/16/2015 10:36 AM, Rameez Mj wrote:

I have a bacterial whole genome project going on with
iontorrent proton 200bp platform. I analysed my data with
princeq. It being of high average coverage suggested by
MIRA(129x). I removed exact duplicates with princess-lite and
extracted 1800000 reads randomly using a python script
"subsampler". MIRA assembles it to >3000 contigs (details are
there in the assembly log).Result of data analysis using
princess, Assembly log and result info generated by MIRA is
attached.

Now I need any experienced person to suggest kindly that is it
wise to continue with this data? How hard it will be for me to
complete this project successfully. what are the best tools
and methods I can use on this?

I know the question is raw but I expect something from
you.Thanking you in advance.



-- You have received this mail because you are subscribed to the
mira_talk mailing list. For information on how to subscribe or
unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html



Other related posts: