[mira_talk] Re: less number of reads used

  • From: Manoharan <manoharan.k@xxxxxxxxxxxxxxx>
  • To: mira_talk@xxxxxxxxxxxxx
  • Date: Fri, 18 Apr 2014 10:51:21 +0530

Hi Bastien,

Thanks for reply,

I took some time to analyze my data for rrna & highly expressed genes. As you said almost all (90% unassembled) my reads went in digital normalization. I had run alignment against rRNA of solanum genus only 20% reads were aligned and I checked the duplication using fastqc I observed ~30% duplicate reads. Even if it is removing rrna and duplicates at least 40% data has to be used but only 20% reads are used.

Is there any way to increase number of reads use age or can I have option of switching off (digital normalization)?

Thanks & Regards,
Manoharan


On Thursday 10 April 2014 11:57 AM, Bastien Chevreux wrote:
On 10 Apr 2014, at 6:50 , Manoharan <manoharan.k@xxxxxxxxxxxxxxx> wrote:
I am trying to run mira for ion proton plant transcriptome data(2 GB) mean read 
length is between (90-130) for all the 6 sample together. It comes under 
solanum genus. In the assembly only 20% reads are used.
Is it likly possible because low data or I am lacking in parameters?
MIRA logs the reason for not using reads in the debris file located in the info 
directory, have a look at it. Being an transcriptome denovo, I suspect the 
digital normalisation kicked out unnecessary copies of rRNA and other very 
highly expressed genes.

I have another question regarding Whole genome denovo assembly for bacteria. It is 
a 5 MB Genome. I have obtained 80X coverage (Ion proton) data (mean read length 
150). But after assembly (I am getting >2000 contigs). For this also I have 
used same above parameter except est option. I have two doubts,

What could be the reason for more number of contigs(may be repeat, GC, Genome 
complexity) but how to find out which is the one?
2000 contigs is way too much for a 5 MB bacterium. The most complex ones I’ve 
seen would land at around ~350 contigs for 100bp reads for 5 MB. Something else 
must cause the problem, but I cannot tell you what unless I can have a look at 
the data (and preferably map against a good known reference).

B.




--
You have received this mail because you are subscribed to the mira_talk mailing 
list. For information on how to subscribe or unsubscribe, please visit 
http://www.chevreux.org/mira_mailinglists.html

Other related posts: