[mira_talk] Re: Need suggestion on quality of NGS data and post assembly procedures

  • From: Rameez Mj <rameez03online@xxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Tue, 16 Jun 2015 17:06:58 +0530

I am Using MIRA 4.0.2 version which is latest.
I am doing the alignment of contigs.
Is it possible to disable the warning(Very high average coverage)? If so
how to do that? any specific command for that?

On 16 June 2015 at 14:37, Andrej Benjak <abenjak@xxxxxxxxx> wrote:

Hi Rameez,

First, I would try the newest version of MIRA, hopefully the assembly
would improve.
I am not sure if it is necessary to remove PCR duplicates. This helps in
extreme cases, like with libraries enriched using DNA array capture, etc.
In your case, Prinseq reports 1% duplicates, which is expected given the
coverage.
I would worry about the genome coverage distribution since your bug seems
extremely GC rich. Check the alignments of the largest contigs or make some
coverage files. If the coverage happens to go very high-very low in a
frenzy manner then you cannot expect a perfect de novo assembly but you can
try assembling it in EST mode (you risk more misaemblies, so do not naively
do synteny analyses, but the genic part should be assembled in longer
contigs).
Also, because of the high GC% and possibly very low coverage areas and
more sequencing errors, try another assembly with the complete dataset
(disable MIRA's warning on high coverage).

Cheers,
Andrej

On 06/16/2015 10:36 AM, Rameez Mj wrote:

I have a bacterial whole genome project going on with iontorrent proton
200bp platform. I analysed my data with princeq. It being of high average
coverage suggested by MIRA(129x). I removed exact duplicates with
princess-lite and extracted 1800000 reads randomly using a python script
"subsampler". MIRA assembles it to >3000 contigs (details are there in the
assembly log).Result of data analysis using princess, Assembly log and
result info generated by MIRA is attached.

Now I need any experienced person to suggest kindly that is it wise to
continue with this data? How hard it will be for me to complete this
project successfully. what are the best tools and methods I can use on this?

I know the question is raw but I expect something from you.Thanking you
in advance.



--
You have received this mail because you are subscribed to the mira_talk
mailing list. For information on how to subscribe or unsubscribe, please
visit http://www.chevreux.org/mira_mailinglists.html

Other related posts: