Dear Bastien,
Thank you very much for your suggestion.
Actually, what I am trying to do is quantification of all reads used for each
contig (the information I retrieve from "info_contigstats.txt" file) without
normalization, all the way to singlets (for my study, it is ok if the singlets
are junk, but still need to count those).
Would you give me suggestion to achieve this goal?
Best regards, Lyu
----- Original Message -----
From: Bastien Chevreux <bach@xxxxxxxxxxxx>
To: mira_talk@xxxxxxxxxxxxx
Date: 2016/9/2, Fri 13:31
Subject: [mira_talk] Re: Same singlets,
On 01 Sep 2016, at 1:23 , wakamoto5959@xxxxxxxxxxx wrote:
Yes, those reads are identical.
Also I use "-HS:ldn=no", so I expect it is not because of normalisation.
I just checked, actually those reads do not occur in contigs. What is this
mean?
It means that, after you switched off every parameter MIRA has to defend
itself against ungodly amounts of repeats, namely
- digital normalisation (-HS:ldn=no)
- masking of nasty repeats (-HS:mnr=no)
- and the assembly process stopping when it encounters deep repeats
(-SK:mmhr=10)
that MIRA did exactly what you told it to do: as a last resort, it put aside
these megahubs reads as “cannot assemble” and carried on with the assembly to
give you at least some result.
However, you then asked to see “singlets”, and therefore MIRA dutifully put
all those “cannot assemble megahub” reads back into the output.
What were your reasons to do that?
Two things:
- I absolutely do recommend using digital normalisation on RNASeq, that
tackles the repeat problem of highly expressed genes very effectively
- you absolutely do not want to see singlet reads in the output of a RNASeq
assembly. More often than not, those will be junk reads, chimeras, etc.pp
Trust me on that, I’ve looked at tons of RNASeq over the years: you do want
digital normalisation and do not want singlets.
B.