On Jan 30, 2012, at 4:24 , Dimitar Kenanov wrote: > So i have that RNA-seq Torrent data. I tried to map it to A. Thaliana genome > but had no success at least initially. I was checking the log file and > reading the manual and it seemed the problem was the repeats. It may be helpful to actually see what is in the log file as "problem" is rather unspecific. > In the log file there was a hash stat like this: Was it the first, second or third hashstat? Do you get similar hashstats when doing a de-novo on the same reads? > . > 97910 2 > 102291 2 > 102585 2 > 103642 2 > 103889 2 > 104277 2 > 105422 2 > ========================================================= > It is totally freaking :) A bit disconcerting, agreed. Do you get similar hashstats when doing a de-novo on the same reads? > So my question is the following. How exactly i can estimate the NRR value. I > set it for now to 10 but have the feeling it should be much lower in my case. > In the manual are present couple of examples but at least to me it was not > very clear how to proceed with the estimation. I haven't really a good rule for it. I know how it should look like, but have no words to describe it atm. > Btw, out of 2M reads only 15K reads aligned to the first chromosome of > A.thaliana. Here is some stat from the log: > -------------- Contig statistics ---------------- > Contig id: 1 > Contig length: 30432290 > > Sanger 454 IonTor PacBio Solexa > Solid > Num. reads 1 0 15963 0 0 > 0 > 100% merged reads - - - - 0 > 0 > This stat is a bit strange to me as well. What is that Sanger coverage at all? Cause the backbone sequence is loaded as Sanger and treated as such in these statistics. B.