[mira_talk] Re: NRR and hash stat

On Jan 30, 2012, at 4:24 , Dimitar Kenanov wrote:
> So i have that RNA-seq Torrent data. I tried to map it to A. Thaliana genome 
> but had no success at least initially. I was checking the log file and 
> reading the manual and it seemed the problem was the repeats.

It may be helpful to actually see what is in the log file as "problem" is 
rather unspecific.

> In the log file there was a hash stat like this:

Was it the first, second or third hashstat? Do you get similar hashstats when 
doing a de-novo on the same reads?

> .
> 97910    2
> 102291    2
> 102585    2
> 103642    2
> 103889    2
> 104277    2
> 105422    2
> =========================================================
> It is totally freaking :)


A bit disconcerting, agreed. Do you get similar hashstats when doing a de-novo 
on the same reads?

> So my question is the following. How exactly i can estimate the NRR value. I 
> set it for now to 10 but have the feeling it should be much lower in my case. 
> In the manual are present couple of examples but at least to me it was not 
> very clear how to proceed with the estimation.

I haven't really a good rule for it. I know how it should look like, but have 
no words to describe it atm.

> Btw, out of 2M reads only 15K reads aligned to the first chromosome of 
> A.thaliana. Here is some stat from the log:
> -------------- Contig statistics ----------------
> Contig id: 1
> Contig length: 30432290
> 
>               Sanger         454      IonTor      PacBio      Solexa       
> Solid
> Num. reads               1           0       15963           0           0    
>        0
> 100% merged reads       -           -           -           -           0     
>       0
> This stat is a bit strange to me as well. What is that Sanger coverage at all?

Cause the backbone sequence is loaded as Sanger and treated as such in these 
statistics.

B.


Other related posts: