[mira_talk] Re: Need suggestion on quality of NGS data and post assembly procedures

From: Andrej Benjak <abenjak@xxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: Tue, 16 Jun 2015 11:07:03 +0200

Hi Rameez,

First, I would try the newest version of MIRA, hopefully the assembly would improve.
I am not sure if it is necessary to remove PCR duplicates. This helps in extreme cases, like with libraries enriched using DNA array capture, etc. In your case, Prinseq reports 1% duplicates, which is expected given the coverage.
I would worry about the genome coverage distribution since your bug seems extremely GC rich. Check the alignments of the largest contigs or make some coverage files. If the coverage happens to go very high-very low in a frenzy manner then you cannot expect a perfect de novo assembly but you can try assembling it in EST mode (you risk more misaemblies, so do not naively do synteny analyses, but the genic part should be assembled in longer contigs).
Also, because of the high GC% and possibly very low coverage areas and more sequencing errors, try another assembly with the complete dataset (disable MIRA's warning on high coverage).

Cheers,
Andrej

On 06/16/2015 10:36 AM, Rameez Mj wrote:

I have a bacterial whole genome project going on with iontorrent proton 200bp platform. I analysed my data with princeq. It being of high average coverage suggested by MIRA(129x). I removed exact duplicates with princess-lite and extracted 1800000 reads randomly using a python script "subsampler". MIRA assembles it to >3000 contigs (details are there in the assembly log).Result of data analysis using princess, Assembly log and result info generated by MIRA is attached.

Now I need any experienced person to suggest kindly that is it wise to continue with this data? How hard it will be for me to complete this project successfully. what are the best tools and methods I can use on this?

I know the question is raw but I expect something from you.Thanking you in advance.

--
You have received this mail because you are subscribed to the mira_talk mailing
list. For information on how to subscribe or unsubscribe, please visit
http://www.chevreux.org/mira_mailinglists.html

Follow-Ups:
- [mira_talk] Re: Need suggestion on quality of NGS data and post assembly procedures
  - From: Rameez Mj

References:
- [mira_talk] Need suggestion on quality of NGS data and post assembly procedures
  - From: Rameez Mj

[mira_talk] Re: Need suggestion on quality of NGS data and post assembly procedures

Other related posts: